You are on page 1of 15

SNOBOL

SNOBOL (StriNg Oriented symBOlic Language) is a computer programming language developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky. (The name is a jocular reference to COBOL and ALGOL, but these languages have no other connection and no other notable similarities). During the 1950s and 1960s there was a flourishing of interest in special-purpose computer languages. SNOBOL was one of a number of text-string-oriented languages, and one of the more successful; others included COMIT and TRAC. SNOBOL was widely used in the 1970s and 1980s as a text manipulation language, but in recent years its popularity has faded as newer languages such as Awk and Perl have made string manipulation by means of regular expressions popular. It is now primarily a special interest language used mainly by enthusiasts, and new implementations are rare. However, SNOBOL's pattern matching algorithm is in many ways more powerful than regular expressions. The classic implementation was on the PDP-10. It has been used to study compilers, formal grammars, and artificial intelligence, especially machine translation and machine comprehension of natural languages. The original implementation was on an IBM 7090 at Bell Labs, Holmdel, N.J.. SNOBOL4 was specifically designed for portability; the first implementation was on an IBM 7094 but it was rapidly ported to many other platforms. SNOBOL was originally called SEXI - String EXpression Interpreter. The SNOBOL4 (final) variant of the language supports a number of built-in data types, such as integers and limited precision real numbers, strings, patterns, arrays, and tables, and also allows the programmer to define additional data types and new functions. SNOBOL4's programmer-defined data type facility was advanced at the time (it preceded, and resembles, Pascal's "records" and C's "structs"). SNOBOL4 stands apart from the mainstream programming languages of its time by having patterns as a first-class data type (i.e. a data type whose values can be manipulated in all ways permitted to any other data type in the programming language) and by providing operators for pattern concatenation and alternation. Strings generated during execution can be treated as programs and executed.

A SNOBOL pattern can be very simple or extremely complex. A simple pattern is just a text string (e.g. "ABCD"), but a complex pattern may be a large structure describing, for example, the complete grammar of a computer language.

Computer Articles and Tutorials ...PHP, MySQL, Perl...

www.dixondevelopment.co.uk

SNOBOL provides the programmer with a rich assortment of features including some rather exotic ones. As a result it is possible to use SNOBOL as if it were an object-oriented language, a logical programming language, a functional language or a standard imperative language by changing the set of features used to write a program. It also concatenates strings that are simply placed next to each other in a statement. It keeps strings in a memory heap, and frees programmers from concerns about memory allocation and management for strings. It is normally implemented as an interpreter because of the difficulty in implementing some of its very high-level features, but there is a compiler, the SPITBOL compiler, which provides nearly all the facilities that the interpreter provides. The Icon programming language is a descendant of SNOBOL4.

Snobol is a child of Comit. Snobol was born in year 1962. It became Snobol 2 in year 1964. It became Snobol 3 in year 1965. It became Snobol 4 in year 1967. Then it begat Icon in year 1970.

Stdin is the input stream for a program.

Sample program
Here's a sample Snobol 4 program. There's sample data at the end along with the output which is produced by the sample data.

Enjoy! * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Sample Snobol 4 program This program builds a linked list of each of the unique words in the text read from stdin. The words are also maintained in a table to make it easy to spot duplicates. When the end of the input is reached, the words are printed out in order of appearance along with a count of how many were found. This program is intended to demonstrate Snobol 4 features. It is not intended to be particularily useful. It also deliberately violates a number of fairly standard Snobol 4 style rules in order to illustrate certain language features (the violations are noted in the program's comments). A few points before we begin: 1. keep an open mind as Snobol 4 is almost certain quite unlike any other programming language that you've ever seen. 2. remember that Snobol 4 is also a language from a different era. Many of the things which we take for granted today, like the "knowledge" that gotos are bad, weren't taken for granted in the 1960s when Snobol was "growing up". Edsger W. Dijkstra wrote his famous letter, GO TO Statement Considered Harmful, to the ACM in 1968 (one year after Snobol 4 was finalized). N.B. I'm not suggesting that gotos are good. Just keep in mind that it really was a different era. 3. Snobol 4 is quite a powerful language with many aspects which can't be illustrated in a program as short as this one. Deciding what you think of Snobol 4 and its abilities based on this sample is like reading a page or two at random from a novel. You don't get much context and you've no idea how the novel's story plays out. 4. Snobol 4 is a dead language in the same sense that Latin is a dead language. i.e. there are still folks out there who use the language in various ways. Check out www.snobol4.com for more information (I'm not affiliated with

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

www.snobol4.com in any way). 5. Snobol 4 programs run in an interpretive environment. This tends to make them a fair bit slower than compiled languages like C and FORTRAN. The only meaningful benchmark is the time which elapses between when a question is asked and when an answer is obtained. The actual time that it takes to run the final program is often a rather small portion of the total time between the question and the answer. 6. Snobol 4 comes from the era of punched cards. One consequence of this is that Snobol 4 ignores the case of everything except the contents of quoted strings. I've written this sample in lower case. It would have been more traditional if I'd have written it in UPPER CASE but people today find that rather hard to read. 7. Snobol 4 was a testbed for new ideas. Some of these ideas, like the near total lack of control structures, didn't work out. Others, like the approach to pattern matching, were incredibly powerful. Bias alert: I've written somewhere on the order of 200,000 lines of Snobol in my career. I wouldn't have used it this much if it didn't have certain redeeming features! Snobol and I enjoyed the time that we spent together (well, at least I did!). Three major Snobol 4 language characteristics to keep in mind: 1. Snobol 4's strengths are in the areas of string manipulation and data structures. Only a small part of these two aspects will be apparent in this sample. 2. Snobol has exactly two control structures - an unconditional goto and a conditional goto. This results in, shall we say, some pretty unstructured code sometimes. I've tried to keep the structure of the code in this sample reasonably well structured although the lack of control structures does make that rather difficult at times. 3. Snobol has no declarations. Consequently, everything that the user's program needs to define gets defined at runtime.

* Shall we begin?

* * Execution begins with the very first line of the Snobol 4 * source file and continues from there. * * * * * * * * Set the statement execution limit to a really big number. By default, a Snobol program is terminated if it executes more than 50,000 statements. &stlimit is an example of a Snobol keyword. The various keywords are used to manage certain runtime parameters. There's nothing special about 999999999 other than that it is pretty big. &stlimit = 999999999 * * * * * * * * * * * Let's create a datatype to hold the words and their usage counts. This uses the built-in "data()" function to create the new datatype on-the-fly. Since Snobol 4 has no declarations, this is the only way to create a new datatype. The new datatype will be called "wordnode" and will have a "word" field, a "count" field and a "next" field. Since Snobol is a weakly typed language, we don't need to specify a type for these fields. data("wordnode(word,count,next)") * * Now we need a table to hold the words in. We'll use the * built-in TABLE() function to create a new table which we'll * assign to the "wordtable" variable. wordtable = table() * * * * * * * * * * * * * * * Define a function to process a single word from the input. This uses the built-in define() function which defines a new function type. The function will be called "doword". It will take the "word" as a parameter. The function will need a local variable called "tmp". Once we've defined the function, we need to skip around the body of the function as we certainly don't want to execute it right now. The ":(skip.doword)" at the end of the "define()" line is an unconditional goto to the "skip.doword" label. The body of "doword" starts on the label "doword" (i.e. the same label as the name of the function). define("doword(word)tmp") :(skip.doword)

doword * * Map the word to upper case. * Note the use of a + in column 1 to continue the previous statement * onto a new line. + word = replace( word, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz" )

* * Get the word's wordnode out of the wordtable. * tmp will be NULL if we've never seen this word before. tmp = wordtable<word> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Create a new wordnode and insert it at the end of the linked list if one doesn't already exist. There's a LOT going on here. The first key point is the call to "ident()". The "ident()" built-in function checks if the TWO parameters that it is given are identical. Since we've only provided one parameter, Snobol supplies a NULL pointer as the second parameter which means that we're checking if "tmp" is NULL. If the two parameters are identical then "ident()" returns a NULL pointer which gets concatenated with our word. Concatenating NULL with any string yields the string (i.e. this concatenation doesn't do anything. Things get MUCH more interesting if the two parameters to "ident()" are NOT identical. In this case, the "ident()" function call fails. A failed function call is NOT like a runtime error. It is just an indication that what you tried to do either didn't work or was incorrect or false. If any part of a Snobol statement fails then the entire statement fails which, in this case, means that the call to "wordnode()" never happens and "tmp" is not assigned to. Now, if the "ident()" call worked then we create a new "wordnode" by calling the "wordnode()" function (which was implicitly defined when we created the "wordnode" datatype above). If we created a new "wordnode" then we need to add it to the end of the linked list of "wordnode"s. What we do is conditionally go to donode.not.new.word if the statement failed. Remember, if the "ident()" call fails then the entire statement fails. There's nothing else in this statement which can fail so checking if the statement failed is the same as checking if the "wordnode" already existed. As an aside, Snobol has runtime errors and runtime errors are fatal (i.e. if your program causes a runtime error then it dies). I'll point out a potential runtime error shortly.

* * * *

The parameters to "wordnode" are the initial values to assign to each of the "wordnode" fields. We'll be lazy and not bother to even specify values for the "count" and "next" fields (Snobol will provide NULL values for these fields since we left them out). tmp = wordnode( ident(tmp) word ) :f(donode.not.new.word)

* * * * * * * * * * * * * * * * * * * * * * *

We just created a new "wordnode" - add it to the "wordtable" table and put it at the end of the linked list. Adding it to the "wordtable" table is easy. The linked list is a bit more work. We have two global variables, "wordlist" and "wordlist_last". We maintain "wordlist_last" to point at the most recently added "wordnode" in the list (i.e. the last one in the list). If this is the first "wordnode" that we've ever seen then both "wordlist" and "wordlist_last" will be NULL (because they've never been assigned a value and every identifier starts out life NULL). Failing to initialize the "wordlist" and "wordlist_last" global variables to NULL as a reminder that they exist was a dirty move. I did it to emphasize that initializing them isn't necessary. That said, initializing them is very important from a style perspective. Add the new "wordnode" to the table. Note that if wordtable isn't actually associated with a table or an array then trying to treat it like a table or an array by using the <...> construct would result in a runtime error (i.e. the program would die at this point). wordtable<word> = tmp

* Use the "ident()" trick again to see if we've every seen * a word before. ident(wordlist) * * * * * * * * * :f(donode.not.first.word)

This is the first word ever - make "wordlist" and "wordlist_last" point at it and we're done. Note that except for calls to "define()" (see above), I always put unconditional gotos on their own line. This makes it easier to add new statements immediately before the unconditional goto (there is never a need to put anything between a "define()" call and the subsequent unconditional goto so I put them together on the same line). wordlist = tmp wordlist_last = tmp :(donode.done.first.word)

* * This isn't the first word ever - add it to the end of the list.

donode.not.first.word next(wordlist_last) = tmp wordlist_last = tmp * We jump to here when we're done handling the very first word. donode.done.first.word * We jump to here when we're dealing with a word that we've seen before. * Even though both of these 'heres' are essentially the same place, I use * separate labels since it maintains a vague structure (i.e. I try to structure * my Snobol code to at least resemble the if-then-else-fi structure found * in programming languages that have real control structures). donode.not.new.word * * * * * * * * * * * * * * * * * Done dealing with new "wordnode"'s. tmp now references the "wordnode" for the current word. Increment the word count. There's a tiny bit of magic (and stupidity) going on here. We didn't provide initial values for "count" or "next" when we created the "wordnode" above. If the increment below is being done on a virgin "wordnode" then the call to "count(tmp)" to the immediate left of the "+" operator will return NULL. If one of the operands of an arithmetic operation is NULL then Snobol provides a value of 0 which is exactly what you want. I said that this is both "magic" and "stupid". The magic is that this whole approach to handling NULLs can come in handy from time to time. The "stupid" is that we should have explicitly initialized "count" to 0 when we created the "wordnode". count(tmp) = count(tmp) + 1 * * * * * * * * * * * * * * * * We're done. Return to the caller. The value of this function will be the last value assigned to the variable whose name is the same as the function's name. Since we never assigned anything to "doword", the return value will be NULL. In "real life", it might make sense to set "doword" to "tmp" or just use "doword" instead of "tmp". This would make the return value of "doword" be the "wordnode" for the word which was passed in as a parameter. (seems reasonable to me but we're not going to bother). This function call worked so we'll do a normal return by branching to the special "return" label. If we wanted the function call to fail (see discussion of what failure means above), we'd return by branching to the "freturn" label. You can bail out of a function at any time by branching to "return" or "freturn".

:(return) * Now we need the label that we jumped to when we defined the "doword" * function. Here it is. * * IMPORTANT: we get here during program startup immediately * after executing the "define()" call that "defined" the "doword" function. skip.doword * * Here's the main loop. * * We read input lines by referencing the special "input" variable. * The reference will fail when the end of the input is reached. mainloop line = input * * * * * * * * * * * We've got an input line. Strip off words and pass each one to "doword". Jump back up to mainloop when there are no more words. A bit of pre-processing of "line" will help things along here. If we put a blank at the front and at the end then the pattern to identify words is much simpler. The next line concatenates a blank with line and then concatenates the result with another blank. line = ' ' line ' ' * "letters" is just all the letters in the alphabet in both lower * and upper case. Defining it across two lines makes it really * obvious if we missed a letter by accident in one of the lines. letters = 'abcdefghijklmnopqrstuvwxyz' + 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' * * * * * * * * * * * * Now remove everything from line that is neither a letter or a blank. There are better ways to do this but this will do (and the better ways open up issues that I'd rather leave closed). This is our first pattern matching statement. It looks through "line" for the first character which isn't a letter or a blank (because we concatenated "letters" with ' ' to construct the argument). If it finds one then it replaces it with a blank. If the pattern match succeeds then the statement succeeds and we spin back to rmpunc to hit the next one. If the pattern match fails then we just continue. :f(done)

rmpunc line notany(letters ' ') = ' ' * * * * * * * * * * * * * * * * * :s(rmpunc) We need a pattern that matches a sequence of letters. We could do this on the fly but style says we should do it here (in an optimized version of Snobol called Spitbol, pre-defining patterns can make a dramatic difference to how fast the program runs). This pattern matches a blank followed by one or more letters followed by a blank. "span()" is aggressive in the sense that it always matches as many characters as possible. If the "span()" part of the pattern works then what it matches will be saved in the "newword" variable. In "real life", this pattern would probably be defined before "mainloop" to avoid any cost inside the "mainloop" loop (it isn't very expensive so we're not going to worry about it). word_pattern = ' ' ( span(letters) . newword ) * * * * * * * * We're ready to roll. Spin through the line matching words. If we find a word then it will be remembered in "newword" so we replace it with a blank. If the match failed then there are no more words so we jump back "mainloop" to get the next input line. If it worked, hand the word to "doword()" and spin back for the next word in the input line. line word_pattern = ' ' doword(newword) :f(mainloop) :(wordloop) ' '

wordloop

* * We'll get here when there's no more input (see mainloop label above). done * * * * * * * Run down the linked list and print out all the words along with a count of how often they appeared. The "rpad()" function pads the first parameter on the left with spaces to make it at least as long as the length specified by the second parameter. It returns the first parameter unchanged if it is already long enough or too long. tmp = wordlist doneloop ident(tmp) output = lpad(count(tmp),10) ' ' word(tmp) tmp = next(tmp) :s(alldone) :(doneloop)

* * We are (almost) all done. alldone * * Let's say goodbye the hard way, shall we? * Pay attention, I'm only going to say this once . . . * * 1. set "mysource" to a string containing some Snobol code * which will print out "Goodbye!" and then jump to the * end label (where all good Snobol programs go to terminate * normally). * mysource = "doit output = 'Goodbye!' :(end)" * * 2. invoke the compiler (at runtime here folks) to compile * the code segment we just wrote. Put the compiled code * into "mycode". if "code()" fails then there's a syntax * error (finding it is left as an exercise for the writer). * * If the "code()" function worked then immediately jump * to the "doit" label that we defined in the source code * string (see the assignment to "mysource" above). * * This particular Snobol 4 feature definitely takes the * notion of self-modifying code to new heights! mycode = code(mysource) :s(doit) * * If we get here then the compilation attempt failed. * Mumble something and terminate anyways by walking into the * "end" label. output = "Hmph!" * * The "end" label marks the end of the program. end

Sample data
Here are some words. Even more words. But not very many words. After all, we don't want to run out of words. Do we?

Output for the sample data


Here's the output which is produced if this sample program is run using the sample data provided above. This output was produced on 2002/10/04 using Snobol 4 running under Linux on an Athlon system. Seewww.snobol4.com if you're interested in running Snobol 4 on your system. 1 here

1 1 4 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 Goodbye!

are some words even more but not very many after all we don t want to run out of do

Sample
* Find biggest words and numbers in a test string * BIGP = (*P $ TRY *GT(SIZE(TRY,SIZE(BIG))) $ BIG FAIL STR = 'IN 1964 NFL ATTENDANCE JUMPED TO 4,807884; ' 'AN INCREASE OF 401,810.' P = SPAN('0123456789,') BIG = STR BIGP OUTPUT = 'LONGEST NUMBER IS ' BIG P = SPAN('ABCDEFGHIJKLMNOPQRSTUVWXYZ') BIG = STR BIGP OUTPUT = 'LONGEST WORD IS ' BIG END

Chapter 1 : FUNDAMENTALS SNOBOL4 is really a combination of two kinds of languages: a conventional language, with several data types and a simple but powerful control structure, and a pattern language, with a structure all its own. The conventional language is not block structured, and

may appear old-fashioned. The pattern language, however, remains unsurpassed, and is unique to SNOBOL4. You should try to master the conventional portion of SNOBOL4 first. When you're comfortable with it, you can move on to pattern matching. Pattern matching by itself is a very large subject, and this manual can only offer an introduction. The sample programs accompanying Vanilla SNOBOL4, as well as the many SNOBOL4 books available from Catspaw can be studied for a deeper understanding of patterns and their application. We'll begin by discussing data types, operators, and variables.
1.1 SIMPLE DATA TYPES

SNOBOL4 has several different basic types, but has a mechanism to define hundreds more as aggregates of others. Initially, we'll discuss the two most basic: integers and strings.
1.1.1 Integers

An integer is a simple whole number, without a fractional part. In SNOBOL4, its value can range from -32767 to +32767. It appears without quotation marks, and commas should not be used to group digits. Here are some acceptable integers: These are incorrect in SNOBOL4:
13.4 49723 3,076 14 -234 0 0012 +12832 -9395 +0

fractional part is not allowed larger than 32767 number must contain at least one digit comma is not allowed

Use the CODE.SNO program to test different integer values. Try both legal and illegal values. Here are some sample test lines:
Enter SNOBOL4 statements: ? OUTPUT = 42 42 ? OUTPUT = -825 -825 ? OUTPUT = 73768 Compilation error: Erroneous integer, re-enter:

1.1.2 Reals

Vanilla SNOBOL4 does not include real numbers. They are available in SNOBOL4+, Catspaw's highly enhanced implementation of the SNOBOL4 programming language.
1.1.3 Strings

A string is an ordered sequence of characters. The order of the characters is important: the strings AB and BA are different. Characters are not restricted to printing characters; all of the 256 combinations possible in an 8-bit byte are allowed. Normally, the maximum length of a string is 5,000 characters, although you can tell SNOBOL4 to accept longer strings. A string of length zero (no characters) is called the null string. At first, you may find the idea of an empty string disturbing: it's a string, but it has no characters. Its role in SNOBOL4 is similar to the role of zero in the natural number system. Strings may appear literally in your program, or may be created during execution. To place a literal string in your program, enclose it in apostrophes (')1 or double quotation marks ("). Either may be used, but the beginning and ending marks must be the same. The string itself may contain one type of mark if the other is used to enclose the string. The null string is represented by two successive marks, with no intervening characters. Here are some samples to try with CODE.SNO:
? OUTPUT = 'STRING LITERAL' STRING LITERAL ? OUTPUT = "So is this" So is this ? OUTPUT = '' ? OUTPUT = 'WHO COINED THE WORD "BYTE"?' WHO COINED THE WORD "BYTE"? ? OUTPUT = "WON'T" WON'T

SAMPLE : SNOBOL.
OUTPUT = 'Hello World' END

C++
#include <iostream.h> void main() { cout << "Hello World" << endl; }

You might also like