Professional Documents
Culture Documents
semantic errors:
x := 5 # if x is undeclared
a := b+c
foo(a,b)
a[x] := 5
illustrative of
unusual lexical rules (no spaces),
poor syntax (continue used anywhere, or dont even need continue),
poor rules (variables dont need to be declared)
ECS140A
Programming Languages
02-1
ECS140A
Programming Languages
02-2
Syntax BNF
Backus-Naur Form, first used in Algol60 report.
many variants since then, but all similar and most give power of contextfree grammar (study in other courses)
example
<id> ::= <alpha> | <alpha> <rest>
<rest> ::= <rest> <alphanum> | <alphanum>
<alphanum> ::= <alpha> | <digit>
<alpha> ::= A | B | C | D
<digit> ::= 0 | 1 | 2
meta-symbol
meaning
note
::=
is defined as
<>
meta-variable or nonterminal
or
lower precedence
than sequence of < >
A, B, ...
terminal
appear literally
yes
yes
no
no
ECS140A
Programming Languages
02-3
Note the use of recursion in the rule for <rest> it expresses repetition.
left recursive makes things harder to recognize (see text)
there are ways to remove left recursion demonstrate in above
but in general more complicated. so we express repetition in simpler way
{x}
0 or more instances of x
0 or 1 instance of x
Example:
<a> ::= x <b> | y <b>
can be simplified to
<a> ::= (x|y) <b>
Another example (precedence):
<a> ::= w x | y z
is not the same as
<a> ::= w (x | y) z
sometimes BNF is defined to include parentheses; in this class, its OK for
you to use it there too (unless otherwise stated).
ECS140A
Programming Languages
02-4
Parsing
parsing process of recognizing strings (sentences) in a language
used in compilers and other translators (e.g., interpreters)
many ways well look at simple method
Steps in Compilation
draw picture of different phases
lexical analysis scanner
breaks up input into tokens
discards whitespace and comments
grouping characters into identifiers and numbers
although parser could do it, this is simpler.
thus, parser considers identifiers as tokens (terminals).
syntactic analysis parser
takes tokens and sees if valid program by seeing if tokens form a
valid string according to grammar.
semantic analysis e.g., type checking
code generation
ECS140A
Programming Languages
02-5
ECS140A
Programming Languages
02-6
example
railroad
nonterminal
terminal
sequence
alternation
x
y
s1 s2
s1|s2|s3
optional
[x]
repetition
{x}
box x
circle y
s1s2
|s1|
|s2|
|s3|
>
|x|
>
|x|
ECS140A
Programming Languages
02-7
Generating a Parser
method, for a given grammar
determine first sets
determine syntax graphs
translate syntax and first sets into code
first(V) = set of all terminals that can begin a string derived from V and ,
if is in V.
example using the grammar from before. The first sets (starting with
the simpler ones):
first(while) = { while }
first(expression) = { id, number }
first(assignment) = { id }
first(statement) = first(assignment) first(while)
= { id, while }
first(block) = first(statement) { }
= { id, while, }
first(program) = first(block) = { id, while, }
overlapping first sets for the right-hand sides of rules cause problems in
parsing using our technique: they correspond to potential ambiguities.
ECS140A
Programming Languages
02-8
syntax graph
component
terminal
circle x
nonterminal
box x
call x;
sequence
x1x2
alternation
...
repetition
...
code
notes
one procedure for each nonterminal
f_z represents first(z)
call z means to call procedure representing rule for z.
ECS140A
Programming Languages
02-9
The pseudocode for parsing the given grammar is below. Note how there
is one procedure for each nonterminal in the grammar. We assume the
procedure next sets the global variable sym to the current token. We also
assume the function first(x) returns true iff sym is in xs first set. Well
name the first set for x f_x.
main() {
/* read the first token. */
next();
/* parse the input. */
program();
/* do something to ensure that
* all input was parsed.
*/
...
}
program() {
block();
}
block(){
while( first(f_statement) ) {
statement();
}
}
statement(){
if( first(f_assignment) ) {
assignment();
}
else if( first(f_while) )
while_proc();
else ERROR;
}
assignment(){
if( sym is an id )
next();
else ERROR;
if( sym is a := )
next();
else ERROR;
expression();
}
expression(){
if( sym is an id )
next();
else if( sym is a number )
next();
else ERROR;
}
while_proc(){
if( sym is a while )
next();
else ERROR;
expression();
if( sym is a do )
next();
else ERROR;
block();
if( sym is an end )
next();
else ERROR;
}
On hw2, use integrated parser/semantic-checker/code-generator; so intermix statements for semantic checks and code generation in above code.
I.e., our project uses 1 pass; real translators typically use multiple passes
and communicate between passes via the programs parse tree.
What problem would left recursion in the grammar cause in the parser
generated using the above technique?
ECS140A
Programming Languages
02-10