You are on page 1of 211

Unit-III

Syntax Analysis
Bottom Up Parsing
A bottom-up parser creates the parse tree of the given input starting
from leaves towards the root.
A bottom-up parser tries to find the right-most derivation of the given
input in the reverse order.
Bottom-up parsing is also known as shift-reduce parsing because its
two main actions are shift and reduce.
2 methods in shift reduce parsing
Operator Precedence Parsing
LR Parsing(Left-to-right, Rightmost derivation)
SLR
Canonical LR
LALR
Shift-Reduce Parsing
Grammar: Reducing a sentence: Shift-reduce corresponds
SaABe abbcde to a rightmost derivation:
AAbc|b aAbcde S rm a A B e
Bd aAde rm a A d e
aABe rm a A b c d e
S rm a b b c d e
These match
productions
right-hand sides

4
Handles
A handle is a substring of grammar symbols in a
right-sentential form that matches a right-hand side
of a production
Grammar: a bbcde
SaABe a Abcde
AAbc|b a Ade Handle
Bd a ABe
S

Handle Pruning

A rightmost derivation in reverse can be obtained by


handle pruning.
5
A Stack Implementation of A Shift-Reduce
Parser
There are four possible actions of a shift reduce parser action:
1. Shift : The next input symbol is shifted onto the top of the stack.
2. Reduce: Replace the handle on the top of the stack by the non-terminal.
3. Accept: Successful completion of parsing.
4. Error: Parser discovers a syntax error, and calls an error recovery
routine.

Initial stack just contains only the end-marker $.


The end of the input string is marked by the end-marker $.

6
Stack Implementation of
Shift-Reduce Parsing

Stack Input Action


$ id+id*id$ shift
$id +id*id$ reduce E id
Grammar: $E +id*id$ shift
EE+E $E+ id*id$ shift
EE*E $E+id *id$ reduce E id
E(E)
$E+E *id$ shift
E id
$E+E* id$ shift
$E+E*id $ reduce E id
$E+E*E $ reduce E E * E
Find handles
$E+E $ reduce E E + E
to reduce $E $ accept
7
Conflicts during Shift-Reduce Parsing
Shift/reduce conflict
Reduce/reduce conflict
LR-Parser Example: The Parsing Table
State Action Goto
id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 Acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 R1 r1
10 r3 r3 r3 r3 9
LR Parsers
The most powerful shift-reduce parsing (yet efficient) is:
LR(k) parsing.

left to right right-most k lookhead


scanning derivation (k is omitted it is 1)

LR parsing is attractive because:


LR parsing is most general non-backtracking shift-reduce parsing, yet it is still efficient.
The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can be
parsed with predictive parsers.
LL(1)-Grammars LR(1)-Grammars
An LR-parser can detect a syntactic error as soon as it is possible to do so a left-to-right scan of the input.
The LR Parser Algorithm

input a1 a2 ai an $

stack

sm LR Parsing Program output

Xm
sm1
Xm1 action goto
shift
reduce
s0 accept 11
error
Constructing LR(0) Item
An LR(0) item of a grammar G is a production of G a dot at the
some position of the right side.

Ex: A aBb Possible LR(0) Items: A .aBb


(four different possibility) .
A a Bb

.
A aB b

A aBb.
Sets of LR(0) items will be the states of action and goto table of
the SLR parser.
A collection of sets of LR(0) items (the canonical LR(0)
collection) is the basis for constructing SLR parsers.

12
1. Augmented Grammar
G is G with a new production rule SS where S is the
new starting symbol.
2. The Closure Operation
If I is a set of LR(0) items for a grammar G, then closure(I) is the
set of LR(0) items constructed from I by the two rules:
1. Initially, every LR(0) item in I is added to closure(I).

.
2. If A B is in closure(I) and B is a production rule of G; then

.
B will be in the closure(I). We will apply this rule until no more new
LR(0) items can be added to closure(I).
Closure-Example

E E .
closure({E E}) =

E E+T { E E.
ET .
E E+T

T T*F E T.
3. goto Operation
If I is a set of LR(0) items and X is a grammar symbol
(terminal or non-terminal), then goto(I,X) is defined as
follows:


. .
If A X in I then every item in closure({A X })
will be in goto(I,X).
goto-example
Example:

I ={ E . E, E . . E+T, E T,

T . .
T*F, T F,

F . .
(E), F id }

. .
goto(I,E) = { E E , E E +T }
SLR Parsing TableAction Table Goto Table
state id + * ( ) $ E T F
0
1
2
3
4
5
6
7
8
9
10
18
Constructing SLR Parsing Table
(of an augumented grammar G)

1. Construct the canonical collection of sets of LR(0) items for G.


C{I0,...,In}
2. Create the parsing action table as follows
If a is a terminal, A.a in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.
If A. is in Ii , then action[i,a] is reduce A for all a in FOLLOW(A)
where AS.
If SS. is in Ii , then action[i,$] is accept.
If any conflicting actions generated by these rules, the grammar is not
SLR(1).
3. Create the parsing goto table
for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j
4. All entries not defined by (2) and (3) are errors.
19
5. Initial state of the parser contains S.S
Action Table Goto Table
state id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
20
Kernel Item & Non Kernel Item
The LR Parser Algorithm

input a1 a2 ai an $

stack

sm LR Parsing Program output

Xm
sm1
Xm1 action goto
shift
reduce
s0 accept 22
error
LR Parsing Algorithm
The parsing table consists of two parts: a parsing action function and a
goto function.
The LR parsing program determines sm, the state on top of the stack
and ai, the current input. It then consults action[sm, ai] which can take
one of four values:
Shift
Reduce
Accept
Error
LR Parsing Algorithm
If action[sm, ai] = shift s, where s is a state, then the parser executes a
shift move.
If action[sm, ai] = reduce A , then the parser executes a reduce
move.
pop 2*|| items from the stack;
If action[sm, ai] = accept, parsing is completed
If action[sm, ai] = error, then the parser discovered an error.
LR Parsing Algorithm
set ip to point to the first symbol in w$
initialize stack to 0
repeat forever
let s be top most state on stack & a be symbol pointed to by ip
if action[s, a] = shift s
push a then s onto stack
advance ip to next input symbol
else if action[s, a] = reduce A
pop 2*| | symbols of stack
let s be state now on top of stack
push A then goto[s,A] onto stack
output production A
else if action[s, a] == accept
return success
else
error()
Canonical Collection of Sets of LR(1) Items
The construction of the canonical collection of the sets of LR(1) items are similar to the construction of the
canonical collection of the sets of LR(0) items, except that closure and goto operations work a little bit different.

The closure function for LR(1) items is then defined as follows:


closure(I)
{
repeat
for each item A, a in I
for each production in the grammar
each terminal b in FIRST(a)
add , b to I;
until no more items are added to I;
return I;
}
goto operation
If I is a set of LR(1) items and X is a grammar symbol (terminal or
non-terminal), then goto(I,X) is defined as follows:
If A .X,a in I then every item in
closure({A X.,a}) will be in goto(I,X).
Construction of The Canonical LR(1)
Collection
Item(G)
{
C is { closure({S.S,$}) }
repeat
for each I in C
for each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C;
Until no new set of items are added to C;
}
S S
An Example 1. S C C
2. C c C
3. C d
I0: closure({(S S, $)}) =
(S S, $) I3: goto(I1, c) = I6: goto(I3, c) = : goto(I4, c) = I4
(S C C, $) (C c C, c/d) (C c C, $)
(C c C, c/d) (C c C, c/d) (C c C, $) : goto(I4, d) = I5
(C d, c/d) (C d, c/d) (C d, $)
I9: goto(I7, c) =
I1: goto(I1, S) = (S S , $) I4: goto(I1, d) = I7: goto(I3, d) =
(C c C , $)
(C d , c/d) (C d , $)
I2: goto(I1, C) = : goto(I7, c) = I7
(S C C, $) I5: goto(I3, C) = I8: goto(I4, C) =
(C c C, $) (S C C , $) (C c C , c/d) : goto(I7, d) = I8
(C d, $)
S S, $ I1
S C C, $ S (S S , $
C c C, c/d
C d, c/d
C I5
I0 S C C, $ C
C c C, $ S C C , $
C d, $
I2
c
c I6
C c C, $ C
C c C, $
d C d, $ I9
c
d C cC , $
I7
C d , $
d c
C c C, c/d C I8
C c C, c/d
C d, c/d C c C , c/d
I3
I4 d
C d , c/d
An Example

c d $ S C
0 s3 s4 g1 g2
1 a
2 s6 s7 g5
3 s3 s4 g8
4 r3 r3
5 r1
6 s6 s7 g9
7 r3
8 r2 r2
9 r2
Construction of LR(1) Parsing Tables
1. Construct C{I0,...,In} the canonical collection of sets of LR(1) items for G.
2. Create the parsing action table as follows

.
If a is a terminal, A a,b in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.


.
If A ,a is in Ii , then action[i,a] is reduce A where AS.

If SS.,$ is in I , then action[i,$] is accept.



i

If any conflicting actions generated by these rules, the grammar is not LR(1).

3. Create the parsing goto table


for all non-terminals A, if goto(I i,A)=Ij then goto[i,A]=j

4. All entries not defined by (2) and (3) are errors.

5. Initial state of the parser contains S.S,$


Construction of LALR Parsing Tables
1. Create the canonical LR(1) collection of the sets of LR(1) items for the given
grammar.
2. For each core present; find all sets having that same core; replace those sets having
same cores with a single set which is their union. C={I0,...,In} C={J1,...,Jm}
where m n
3. Create the parsing tables (action and goto tables) same as the construction of the
parsing tables of LR(1) parser.
Note that: If J=I1 ... Ik since I1,...,Ik have same cores
cores of goto(I1,X),...,goto(I2,X) must be same.
So, goto(J,X)=K where K is the union of all sets of items having same cores as goto(I1,X).

4. If no conflict is introduced, the grammar is LALR(1) grammar.


Error Handling
A good compiler should assist in identifying and locating
errors
Lexical errors
Syntactic errors
Semantic errors
Logical errors

34
Error Recovery Strategies
Panic mode
Phrase-level recovery
Error productions
Global correction

35
Error Recovery in Predictive Parsing
An error may occur in the predictive parsing (LL(1) parsing)
if the terminal symbol on the top of stack does not match with
the current input symbol.
if the top of stack is a non-terminal A, the current input symbol is
a, and the parsing table entry M[A,a] is empty.
What should the parser do in an error case?
The parser should be able to give an error message (as much as
possible meaningful error message).
It should be recover from that error case, and it should be able to
continue the parsing with the rest of the input.

36
Panic-Mode Error Recovery in LL(1)
Parsing
In panic-mode error recovery, we skip all the input symbols until a
synchronizing token is found.
What is the synchronizing token?
All the terminal-symbols in the follow set of a non-terminal can be used as a
synchronizing token set for that non-terminal.
So, a simple panic-mode error recovery for the LL(1) parsing:
All the empty entries are marked as synch to indicate that the parser will skip all
the input symbols until a symbol in the follow set of the non-terminal A which on
the top of the stack. Then the parser will pop that non-terminal A from the stack.
The parsing continues from that state.
To handle unmatched terminal symbols, the parser pops that unmatched
terminal symbol from the stack and it issues an error message saying that
unmatched terminal is inserted.

37
Phrase-Level Error Recovery
Each empty entry in the parsing table is filled with a
pointer to a special error routine which will take care
that error case.
These error routines may:
change, insert, or delete input symbols.
issue appropriate error messages
pop items from the stack.
We should be careful when we design these error
routines, because we may put the parser into an infinite
loop.
39
Error Recovery in LR Parsing
An LR parser will detect an error when it consults the parsing
action table and finds an error entry. All empty entries in the
action table are error entries.
Errors are never detected by consulting the goto table.
A canonical LR parser (LR(1) parser) will never make even a
single reduction before announcing an error.
The SLR and LALR parsers may make several reductions before
announcing an error.
But, all LR parsers (LR(1), LALR and SLR parsers) will never
shift an erroneous input symbol onto the stack.
41
Panic Mode Error Recovery in LR
Parsing
Scan down the stack until a state s with a goto on a
particular nonterminal A is found.
Discard zero or more input symbols until a symbol a is
found that can legitimately follow A.
The symbol a is simply in FOLLOW(A), but this may not work for all situations.

The parser stacks the nonterminal A and the state


goto[s,A], and it resumes the normal parsing.

42
Phrase-Level Error Recovery in LR
Parsing
Each empty entry in the action table is marked with a
specific error routine.
An error routine reflects the error that the user most
likely will make in that case.
An error routine inserts the symbols into the stack or
the input (or it deletes the symbols from the stack and
the input, or it can do both insertion and deletion).
missing operand(e1)
unbalanced right parenthesis(e2)
Missing operator(e3)
Missing right parenthesis(e4)
43
Example
EE+E
|E*E action goto
|(E)
| id
id + * ( ) $ E
missing operand(e1) 0 s3 e1 e1 s2 e2 e1 1
unbalanced right
parenthesis(e2) 1 e3 s4 s5 e3 e2 acc
Missing operator(e3) 2 s3 e1 e1 s2 e2 e1 6
Missing right
parenthesis(e4) 3 r4 r4 r4 r4 r4 r4
4 s3 e1 e1 s2 e2 e1 7
5 s3 e1 e1 s2 e2 e1 8
6 e3 s4 s5 e3 s9 e4
7 r1 r1 s5 r1 r1 r1
8 r2 r2 r2 r2 r2 r2
YACC
LALR parser generator Yacc
Yet another compiler-compiler
Available on different platforms
UNIX, Linux
Creating an Input/Output Translator with
Yacc
Yacc specification
Yacc compiler y.tab.c
translate.y

y.tab.c C compiler a.out

Input a.out output


Linking lex&yacc

47
A Yacc source program has three parts
declarations
%%
translation rules
%%
supporting C functions
Ex:
EE+T|T
TT*F|F
F(E)|digit
%{
#include <stdio.h>
%}
%token DIGIT
%%
expr : expr + term { $$=$1+$3; }
| term
;
term : term * factor { $$=$1*$3; }
| factor
;
factor : ( expr ) { $$ = $2; }
| DIGIT
;
%%
Auxiliary procedures
Unit-IV
SYNTAX DIRECTED TRANSLATION & RUN TIME
ENVIRONMENT
Syllabus
Syntax directed Definitions-Construction of Syntax Tree-Bottom-up
Evaluation of S-Attribute Definitions- Design of predictive translator -
Type Systems-Specification of a simple type checker Equivalence of
Type Expressions-Type Conversions.
RUN-TIME ENVIRONMENT: Source Language Issues-Storage
Organization-Storage Allocation Parameter Passing-Symbol Tables-
Dynamic Storage Allocation-Storage Allocation in FORTAN.
Introduction
Semantic Analysis computes additional information related to the
meaning of the program once the syntactic structure is known.
In typed languages as C, semantic analysis involves adding
information to the symbol table and performing type checking.
Syntax Directed Translation(SDT)
Grammar+Semanticrules=SDT
Usefulfordoingthingsafterparsing(typechecking,codegeneration)
Basicidea:attachattributestogrammarsymbols,thendosomething
withattributeswhentheyappearinparsetree.
Syntax-Directed Definitions
Each grammar symbol has two kinds of associated attributes:
Synthesized attributes
Evaluated bottom-up .
If only synthesized attributes are used, definition is called an
S-attributed definition.
Example
PRODUCTION SEMANTIC RULE
L E n print(E.val)
E E1 + T E.val = E1.val + T.val
ET E.val = T.val
T T1 * F T.val = T1.val * F.val
TF T.val = F.val
F (E) F.val = E.val
F digit F.val = digit .lexval
S-Attributed & L-Attributed Definitions
S-Attributed Definitions An SDD is S-attributed if every attribute is
synthesized.
L-Attributed Definitions An SDD is L-attributed if every attribute is
synthesized or inherited. Then the rule may use
Inherited attributes associated with the parent.
Inherited or synthesized attributes associated with the attributes of left siblings.
Dependency graphs
The interdependencies among the inherited and synthesized attributes
at the nodes in a parse tree can be depicted by a directed graph called a
dependency graph.
If an attribute b depends on attribute c, then attribute b has to be
evaluated AFTER c.

Production Semantic Rule


E -> E1 + E2 E.val := E1.val + E2.val

61
Evaluation Order
A TOPOLOGICAL SORT of the dependency graph decides the evaluation order
in a parse tree.
Applications of Syntax-Directed Translations
Construction of Syntax Trees
A syntax tree is a condensed form of parse tree.
Functions used to build the syntax tree:
mknode(op,left,right) constructs an operator node with label op, and two
children, left and right
mkleaf(id,entry) constructs a leaf node with label id and a pointer to a symbol
table entry
mkleaf(num,val) constructs a leaf node with label num and the tokens
numeric value val

a-4+c

65
SDD for syntax tree construction
Production Semantic Rules
E -> E1 + T E.nptr := mknode( +, E1.nptr,T.nptr)
E -> E1 - T E.nptr := mknode( -, E1.nptr,T.nptr)
E -> T E.nptr := T.nptr
T -> ( E ) T.nptr := E.nptr
T -> id T.nptr := mkleaf( id, id.entry )
T -> num T.nptr := mkleaf( num, num.val )

66
Bottom-up Evaluation of S-Attribute
Definitions
Syntax-directed definition with only synthesized attributes is
called S-attributed.
Use LR Parser.

Implementation
Stack to hold information about subtrees that have been parsed.
3*5+4
Example:

A.a:= F(X.x, Y.y, Z.z) for production


A ---> XYZ

Z Z.z <--- Top After


Parse
Y Y.y ===>
A A.a
X X.x Value of Symbol Table
State S-att Pointer to a State in LR(1) Parse Table

69
Production Semantic Values (Code)
L ---> E$ Print (Val[Top])
E ---> E + T Val[nTop]:= Val[Top 2] + Val[Top]
E ---> T
T ---> T * F Val[nTop]:= Val[Top 2] * Val[Top]
T ---> F
F ---> (E) Val[nTop]:=Val[Top 1]
F ---> digit

70
Cont.
3+5$

T.Val <--- Top


Top 1
+ Top 2
<--- nTop
= E.Val E.Val

71
Input Stack Attribute Production Used
3*5+4$ - -
*5+4$ 3 3
Production Semantic Values (Code)
*5+4$ F 3 F ---> digit
L ---> E$ Print (Val[Top])
*5+4$ T 3 T ---> F
5+4$ T* 3 E ---> E + T Val[nTop]:= Val[Top 2] +
Val[Top]
+4$ T*5 3 * 5
E ---> T
+4$ T*F 3 *5 F - digit
T ---> T * F Val[nTop]:= Val[Top 2] *
+4$ T 15 T ---> T * F
Val[Top]
+4$ E 15 E ---> T
T ---> F
4$ E+ 15
F ---> (E) Val[nTop]:=Val[Top 1]
$ E+4 15 + 4
F ---> digit
$ E+F 15 + 4 F ---> digit
$ E+T 15 + 4 T ---> F
$ E 19 E ---> E + T
E 19
72
L 19 L ---> E $
Design of a Predictive Translator
Input: A SDT scheme with grammar suitable for predictive parsing.
Output: Code for Syntax-directed translator.
Method:
1. For each nonterminal A construct a function that has a formal parameter for
inherited attribute of A and that returns the values of the synthesized
attributes of A.
2. The code for nonterminal A decides what production to use based on the
current input symbol.
3. The code associated with each production does the following
i. For token X with synthesized attribute x, save the value of x in the variable declared
for X.x. Then generate call to match the token X and advance the input.
ii. For nonterminal B, generate assignment c=B(b1,b2,.,bk) with the function call on
the right side.
iii. For an action, copy the code into the parser.
TYPE CHECKING
A compiler must check that the source program follows both syntactic and semantic
conventions of the source language.
Semantic Checks
Static done during compilation
Dynamic done during run-time
Type checking is one of these static checking operations.
we may not do all type checking at compile-time.
Some systems also use dynamic type checking too.
A type system is a collection of rules for assigning type expressions to the parts of a
program.
A type checker implements a type system.
A sound type system eliminates run-time type checking for type errors.
74
Some examples of static checks:
Type checks
Flow-of-control checks

75
TYPE SYSTEMS
A type system is a collection of rules for assigning type expressions to
the parts of a program.
For example : if both operands of the arithmetic operators of +,- and
* are of type integer, then the result is of type integer

76
Type Expressions
The type of a language construct is denoted by a type expression.
A type expression can be:
1. A basic type
a primitive data type such as integer, real, char, boolean,
type_error to signal a type error
void : no type
2. A type name
a name can be used to denote a type expression.
3. A type constructor
Constructors include:
Array : If T is a type expression then array (I,T) is a type expression denoting the type of an array with
elements of type T and index set I.
Example: int a[20]; array(019,int)
Products : If T1 and T2 are type expressions, then their Cartesian product T1 X T2 is a type expression.

77
Pointers : The type expression for pointer is given as pointer(T) where T is a data type.
Example int *x;
pointer(int)
Functions : A function in programming languages maps a domain type D to a range type R.
The type of such function is denoted by the type expression D R

Example int sort(int a,int b)


intXintint

78
SPECIFICATION OF A SIMPLE TYPE CHECKER

A type checker for a simple language in which the type of each identifier
must be declared before the identifier is used.
The type checker can handle arrays, pointers, statements and functions.
A Simple Language
Consider the following grammar:
PD;E
D D ; D | id : T
T char | integer | array [ num ] of T | T
E literal | num | id | E mod E | E [ E ] | E
79
Translation scheme:
P D;E
DD;D
D id : T { addtype (id.entry , T.type) }
T char { T.type : = char }
T integer { T.type : = integer }
T T1 { T.type : = pointer(T1.type) }
T array [ num ] of T1 { T.type : = array ( 1 num.val , T1.type) }

In the above language,


There are two basic types : char and integer ;
type_error is used to signal errors;
the prefix operator builds a pointer type.
81
Type checking of expressions(E)
1. E literal { E.type : = char }
E num { E.type : = integer }
Here, constants represented by the tokens literal and num have type char and
integer.
2. E id { E.type : = lookup ( id.entry ) }
lookup ( e ) is used to fetch the type saved in the symbol table entry pointed to by
e.
3. E E1 mod E2 { E.type : = if E1. type = integer and
E2. type = integer then integer
else type_error }
4. E E1 [ E2 ] { E.type : = if E2.type = integer and
E1.type = array(s,t) then t
else type_error }
5. E E1 { E.type : = if E1.type = pointer (t) then t 82
Type checking of statements(S)
Statements do not have values; hence the basic type void can be assigned to them. If an error is detected within a statement, then
type_error is assigned.
1. Assignment statement:
S id : = E { S.type : = if id.type = E.type then void
else type_error }
2. Conditional statement:
S if E then S1 { S.type : = if E.type = boolean then S1.type
else type_error }
3. While statement:
S while E do S1 { S.type : = if E.type = boolean then S1.type
else type_error }
4. Sequence of statements:
S S1 ; S2 { S.type : = if S1.type = void and S2.type = void then void
else type_error }

83
Equivalence of Type Expressions
The job of a type checker is to find whether two type expressions are
equivalent or not.
Type equivalence is of two categories
Structural Equivalence
Name Equivalence
Structural Equivalence of Type Expressions
When two expressions are the same basic type or formed by applying
the same constructor to structurally equivalent types then those
expressions are called structurally equivalent.
if (s and t are same basic types) then return true
else if (s=array(s1,s2) and t=array(t1,t2)) then return (sequiv(s1,t1) and sequiv(s2,t2))
else if (s = s1 x s2 and t = t1 x t2) then return (sequiv(s1,t1) and sequiv(s2,t2))
else if (s=pointer(s1) and t=pointer(t1)) then return (sequiv(s1,t1))
else if (s = s1 s2 and t = t1 t2) then return (sequiv(s1,t1) and sequiv(s2,t2))
else return false
Name Equivalence of Type Expressions
In name equivalence the type expressions are given the same name.

typedef struct Node


{
Int x;
}node;
node *l1,*l2;
struct Node *q,*r;
Type Conversion
Example. Whats the type of x + y if:
x is of type real;
y is of type int;
Such an implicit type conversion is called Coercion.
Explicit type conversions
Explicit type conversion is a type conversion which is explicitly
defined within a program.
t1=(float)2
t2=t1 * 3.14
RUN-TIME
ENVIRONMENT
Source Language Issues-Storage Organization-Storage Allocation
Parameter Passing-Symbol Tables-Dynamic Storage Allocation-Storage
Allocation in FORTAN.
Source Language Issues
Before considering code generation
Need to relate static source of a program to actions that must take place at
runtime to implement this program
Relationship between names and data objects
Allocation and deallocation of data objects is managed by runtime support package
Depending on the language, the runtime environment may be
Fully static or dynamic
Type of environment determines the need to use stack, heap or both
Procedure Activation

Program and Procedures An execution of a


procedure body
PROGRAMsort(input,output);
VARa:array[0..10]ofInteger;
PROCEDUREreadarray;
VARi:Integer;
Procedure definition BEGIN Lifetime of Procedure
fori:=1to9doread(a[i]);
END;
Activation
procedure name FUNCTIONpartition(y,z:Integer):Integer;
associated with VARi,j,x,v:Integer; Sequence of steps between the
procedure body BEGIN

first and last statements in the
END; execution of the procedure
PROCEDUREquicksort(m,n:Integer); body, including time spent
VARi:Integer;
executing other procedures
BEGIN Formal parameters
if(n>m)thenBEGIN called by p
i:=partition(m,n);
quicksort(m,i1);
quicksort(i+1,n)
END
END;
BEGIN/*ofmain*/
a[0]:=9999;a[10]:=9999;
Procedure call readarray;
quicksort(1,9)
END.
Actual parameters
Flow of control
Assumptions about the flow of control among procedures during
program execution
Control flows sequentially
Execution of a procedure starts at the beginning of procedure body and
eventually returns control to the point immediately following the place where
the procedure was called
In this case we can use trees to illustrate the flow of control in a program
goto statement (transfers control to another point)
Activation tree
We can use a tree (activation tree) to depict the way control enters and
leaves activations
How to build an activation tree
Each node represents an activation of a procedure
The root represents the activation of the main program
The node for a is the parent of node b iff control flows from activation a to b
The node for a is to the left of the node for b iff the lifetime(a) occurs before lifetime(b)
Activation Tree for a program
PROGRAMsort(input,output);
VARa:array[0..10]ofInteger;
PROCEDUREreadarray; s
VARi:Integer;
BEGIN
fori:=1to9doread(a[i]);
END; r q(1,9)
FUNCTIONpartition(y,z:Integer):

Integer;
VARi,j,x,v:Integer; p(1,9) q(1,3) q(5,9)
BEGIN

END;
PROCEDUREquicksort(m,n:Integer);
VARi:Integer; p(1,3) q(1,0) q(2,3) p(5,9) q(5,5) q(7,9)
BEGIN
if(n>m)thenBEGIN
i:=partition(m,n);
quicksort(m,i1); p(2,3) q(2,1) q(3,3) p(7,9) q(7,7) q(9,9)
quicksort(i+1,n)
END
END;
BEGIN/*ofmain*/
a[0]:=9999;a[10]:=9999;
readarray;
quicksort(1,9)
END.
Activation Tree for a program
s r, p(1,9), p(1,3) and q(1,0) have
executed to completion (dashed
r q(1,9) lines)

p(1,9) q(1,3) Stack contains


q(2,3) top
p(1,3) q(1,0) q(2,3) q(1,3)
q(1,9)
s
Bindings of names
Environment
Function that maps a name to a storage location
f: name storage location

State
Function that maps a storage location to a value
g: storage location value
g( f(name) ) = value environment state

name storage value


Bindings of names
Consider storage location 100 is associated with variable x and it
currently holds value 0
Environment
x is bound to storage location 100
State
Value held is 0

Consider assignment statement x:=10


Environment
x is still bound to storage location 100
State
Value held is 10
Storage Organization
Suppose that the compiler obtains memory from the OS so that it can
execute the compiled program
Program gets loaded on a newly created process
This runtime storage must hold
Generated target code
Data objects
A counterpart of the control stack to keep track of procedure activations
Typical subdivision of Runtime Memory
code for function 1
PASCAL and C use extensions of the control stack
code for function 2
to manage activations of procedures.
...
code for function n
Stack contains information about register values,
value of program counter and data objects whose
global / static area lifetimes are contained in that of an activation.

stack Heap holds all other information. For example,


activations that cannot be represented as a tree.

free space By convention, stack grows down and the top of


the stack is drawn towards the bottom of this slide
(value of top is usually kept in a register)

heap
Activation Record
Information needed by a single returned value
execution of a procedure is managed
using an activation record or frames. actual parameters
Not all compilers use all of the optional control link
fields
Pascal and C push activation record optional access link
on the runtime stack when saved machine status
procedure is called and pop the
activation record off the stack when local data
control returns to the caller
temporaries
Activation Record
1) Temporary values returned value
e.g. those arising in the evaluation of expressions
actual parameters
2) Local data Callers
Data that is local to an execution of the optional control link responsibility
procedure
to initialize
optional access link
3) Saved machine status
State of the machine info before procedure is saved machine status
called. Values of program counter and machine
registers that have to be restored when control Callees
returns from the procedure local data
responsibilit
temporaries y
to initialize
Activation Record
4) Access Link
points to the activation record of the calling procedure. returned value
5) Control link actual parameters
points to the activation record of the caller.

6) Actual parameters optional control link


used by the calling procedure to supply parameters to the called
procedure optional access link
(in practice these are passed in registers)
saved machine status
7) Returned value
used by the called procedure to return a value to the calling
procedure. local data
(in practice it is returned in a register)
temporaries
Storage Allocation Strategies
static allocation lays out storage for all data objects at compile time.
Stack allocation manages the run time storage as a stack
Heap allocation -- allocates and deallocates storage as needed at runtime from a data area known
as heap.

104
Static Allocation
In a static environment (Fortran 77) there are a number
of restrictions:
Size of data objects are known at compile time
No recursive procedures
No dynamic memory allocation
Only one copy of each procedure activation record
exists at time t
We can allocate storage at compile time
Bindings do not change at runtime
Every time a procedure is called, the same bindings occur
Static Allocation code for function 1

code main() code for function 2


...

inti=10;
code f() code for function n

intf(intj) i (int)
{ global / static area
intk;
main() stack
intm; Activation
record k (int)
}
free space
main() f()
{ Activation
intk; record k (int)
f(k); m (int)
} heap
Stack-based Allocation
In a stack-based allocation, the previous restrictions are
lifted (Pascal, C, etc)
procedures are allowed to be called recursively
Need to hold multiple activation records for the same procedure
Created as required and placed on the stack
Each record will maintain a pointer to the record that activated it
On completion, the current record will be deleted from the stack and
control is passed to the calling record
Dynamic memory allocation is allowed
Pointers to data locations are allowed
Stack-based Allocation
PROGRAMsort(input,output); Position in Activation Records
VARa:array[0..10]ofInteger;
Activation Tree On Stack
PROCEDUREreadarray;
VARi:Integer;
BEGIN
fori:=1to9doread(a[i]);
END; s
FUNCTIONpartition(y,z:Integer): s
Integer;
VARi,j,x,v:Integer; a (array)
BEGIN

END;
PROCEDUREquicksort(m,n:Integer);
VARi:Integer;
BEGIN
if(n>m)thenBEGIN
i:=partition(m,n);
quicksort(m,i1);
quicksort(i+1,n)
END
END;
BEGIN/*ofmain*/
a[0]:=9999;a[10]:=9999;
readarray;
quicksort(1,9)
END.
Stack-based Allocation
PROGRAMsort(input,output);
VARa:array[0..10]ofInteger; Position in Activation Records
PROCEDUREreadarray;
VARi:Integer; Activation Tree On Stack
BEGIN
fori:=1to9doread(a[i]);
END;
FUNCTIONpartition(y,z:Integer):Integer;
VARi,j,x,v:Integer; s
BEGIN

s
END;
PROCEDUREquicksort(m,n:Integer); a (array)
VARi:Integer;
BEGIN
if(n>m)thenBEGIN
r r
i:=partition(m,n);
quicksort(m,i1);
quicksort(i+1,n)
END
i (integer)
END;
BEGIN/*ofmain*/
a[0]:=9999;a[10]:=9999;
readarray;
quicksort(1,9)
END.
Stack-based Allocation
PROGRAMsort(input,output);
VARa:array[0..10]ofInteger; Position in Activation Records
PROCEDUREreadarray;
VARi:Integer; Activation Tree On Stack
BEGIN
fori:=1to9doread(a[i]);
END;
FUNCTIONpartition(y,z:Integer):Integer;
VARi,j,x,v:Integer; s
BEGIN

s
END;
PROCEDUREquicksort(m,n:Integer); a (array)
VARi:Integer;
BEGIN
if(n>m)thenBEGIN
r q(1,9) q(1,9)
i:=partition(m,n);
quicksort(m,i1);
quicksort(i+1,n)
END
i (integer)
END;
BEGIN/*ofmain*/
a[0]:=9999;a[10]:=9999;
readarray;
quicksort(1,9)
END.
Stack-based Allocation
PROGRAMsort(input,output);
VARa:array[0..10]ofInteger; Position in Activation Records
PROCEDUREreadarray;
VARi:Integer; Activation Tree On Stack
BEGIN
fori:=1to9doread(a[i]);
END;
FUNCTIONpartition(y,z:Integer):Integer;
VARi,j,x,v:Integer; s
BEGIN

s
END;
PROCEDUREquicksort(m,n:Integer); a (array)
VARi:Integer;
BEGIN
if(n>m)thenBEGIN
r q(1,9) q(1,9)
i:=partition(m,n);
quicksort(m,i1);
quicksort(i+1,n)
END
p(1,9) q(1,3) i (integer)
END;
BEGIN/*ofmain*/ q(1,3)
a[0]:=9999;a[10]:=9999;
readarray;
quicksort(1,9)
END. i (integer)
Heap Allocation
Symbol Tables
A data structure used by a compiler to keep track of semantics of variables.
Data type.
When is used
Where it is stored: storage address. name attributes

Possible implementations: : :
Unordered list: for a very small set of variables.
Ordered linear list: insertion is expensive, but implementation is relatively easy.
Binary search tree: O(log n) time per operation for n variables.
Hash table: most commonly used, and very efficient provided the memory space is
adequately larger than the number of variables.
DATA STRUCTURE FOR SYMBOL
TABLE
LIST
SELF ORGANIZING LIST
HASH TABLE
HASH TABLE
Search Tree
How to store names
Fixed-length name:
allocate a fixed space for each name allocated.
Variable-length name:
A string of space is used to store all names.
For each name, store the length and starting index of each
name.
Retest-II Batch students
Consider the following grammar
S->AS|b
A->SA|a
Construct the SLR parse table for the grammar .Show the
actions of the parser for the input string abab
Unit-V
CODE OPTIMIZATION AND CODE GENERATION
Principal Sources of Optimization-DAG- Optimization of
Basic Blocks-Global Data Flow Analysis Efficient Data
Flow Algorithms-Issues in Design of a Code Generator - A
Simple Code Generator Algorithm.
Introduction
Intermediate Code undergoes various transformations
called Optimizationsto make the resulting code run
faster and taking less space.
A main goal is to achieve a better performance.

source Intermediate
Front End Code Gen target
Code Code Code

user Machine-
independe Machine-
nt dependent
Compiler Compiler
optimizer optimizer
Two levels
Machine independent code optimization
Control Flow analysis
Data Flow analysis
Transformation

Machine dependent code optimization


Register allocation
Utilization of special instructions.
Criteria for Code-Improving Transformation
Transformation must preserve the meaning of programs.
Transformation must, on the average, speed up programs by a
measurable amount.
Transformation must be worth the effort.
An Organization for an Optimizing
Compiler
Control flow analysis
Data-flow analysis
Transformations

Front Code Code


end optimizer generator

Control- Data-
Transfor-
flow flow
mations
analysis analysis
128
Three address code for quick sort
Basic Blocks and Flow Graphs
The Principal Sources of
Optimization
Local optimization: look within basic block
Global optimization: look across blocks
Function-preserving transformations include
Common subexpression elimination
Copy propagation
Dead-code elimination
Constant-folding
Common sub expression Elimination
An occurrence of an expression E is common sub
expression if E was previously computed and the values
of variables in E have not changed since.
Copy Propagation
Statement of form f := g is called a copy statement
Idea is to use g instead of f in subsequent statements
Dead-Code Elimination
Variable that is no longer live (subsequently used) is
called dead.
Copy propagation often turns copy statement into dead
code.
Constant Folding
Constant Folding is the transformation that substitutes an
expression with a constant.
Constant Folding is useful to discover Dead-Code.
x := 32 becomes x := 64
x := x + 32
ode
Loop Optimization
The running time of a program can be improved if we
decrease the amount of instructions in an inner loop.
Three techniques are useful:
1. Code Motion
2. Reduction in Strength
3. Induction-Variable elimination
Code Motion
Move code outside the loop since there are potential
many iterations
Look for expressions that yield the same result
independent of the iterations.

Example
While ( I <= limit 2)do
Induction Variables
A variable x is an Induction Variable of a loop if every
time the variable x changes values, it is incremented or
decremented by some constant.

B3:
B3:
j=j-1 j = j - 1
t4 = 4 *j t4 = t4 -4
t5 = a [ t4] t5 = a [ t4]
If t5 > v goto B3 If t5 > v goto B3
Reduction in strength
The replacement of an expensive operation by a cheaper one
is termed reduction in strength.
X^2 x * x
X * 4 x << 2
DAG representation
of Basic Block (BB)
A DAG of a basic block is a directed acyclic graph with
following node markings:
Leaves are labeled with unique identifier (varname or const)
Interior nodes are labeled by an operator symbol
Nodes optionally have a list of labels (identifiers)
Edges relates operands to the operator (interior nodes are
operator)

146
Example: DAG for BB
t1
*
t1 := 4 * i
4 i
t1 := 4 * i
t3 := 4 * i
t2 := t1 + t3 if (i <= 20)goto L1
+ t2 <= (L1)
* t1, t3
i 20

4 i
147
Algorithm for construction of DAG
Input: A basic block
Output: A DAG for the basic block containing the following information:
1. A label for each node. For leaves, the label is an identifier. For interior nodes, an
operator symbol.
2. For each node a list of attached identifiers to hold the computed values.
Case (i) x : = y OP z Case (ii) x : = OP y Case (iii) x : = y
Method:
Step 1: If y is undefined then create node(y). If z is undefined, create node(z) for
case(i).
Step 2: For the case(i), create a node(OP) whose left child is node(y) and right child is
node(z). ( Checking for common sub expression). Let n be this node.
For case(ii), determine whether there is node(OP) with one child node(y). If not create
such a node.
For case(iii), node n will be node(y).
Step 3: Delete x from the list of identifiers for node(x).
Example
prod
+

prod0 * t5

(1)
t4
t2
[] [] <=

t1 t3
a b * + t7 i 20

4 i0 1
Representation of Array References
Representation of Array References
Applications of DAGS
Automatically detect common sub
expressions.
Determine which identifiers have their values
used in the block.
Determine which statements compute values
that could be used outside the block.
Optimization of Basic Blocks
Common sub-expression elimination: by
construction of DAG
Note: for common sub-expression elimination,
we are actually targeting for expressions that
compute the same value.
+ c

a = b + c
- b, d
b = a d
c = b + c + a d0
d = a - d
b0 c0

153
DAG representation identifies expressions
that yield the same result

+ e
a := b + c
b := b d
c := c + d
+ a - b + c
e := b + c

b0 c0 d0

154
Dead code elimination: Code generation
from DAG eliminates dead code.

c +
a := b + c
a := b + c
b := a d b,d - d := a - d
d := a d
c := d + c
c := d + c a +
d0
b is not live
b0 c0

155
The Use of Algebraic Identities
Algebraic identities represent another important class of
optimization on basic blocks
x+0=0+x=x
x-0=x
x*1=1*x=x
x/1=x
Another class of algebraic optimization is reduction in
strength.
A third class of optimization is constant folding.
Peephole optimization
Peephole: a small moving window in the instruction sequence
Technique: examine a short sequence of target instructions (peephole)
and try to replace it with a faster or shorter sequence
Goals:
- improve performance
- reduce code size
Methods
Redundant instruction elimination
Unreachable code elimination
Flow of control optimization
Algebraic simplifications
Reduction in strength
Use of machine idioms
Redundant instruction elimination

Redundant loads and stores


MOV R0, a
MOV a, R0
Unreachable code
If debug = 1 goto L1
Goto L2
L1: print debugging info
L2:

One peephole optimization is to eliminate jump over jumps.

If debug 1 goto L2
print debugging info
L2:
Flow of control optimization:

goto L1 goto L2

L1: goto L2 L1: goto L2

if a < b goto L1 if a<b goto L2



L1: goto L2 L1: goto L2

goto L1 if a < b goto L2


goto L3
L1: if a < b goto L2
L3: L3:
Algebraic simplification:
x : = x+0
x := x*1
Reduction in strength-replace expensive
operations by equivalent cheaper operations
X^2 x * x
X * 4 x << 2
Use of machine idioms
The target machine may have hardware instructions
to implement certain specific operations efficiently.
x=x+1
MOV x, R0
ADD 1, R0 INC x
MOV R0, x
Code Generation
Final phase in the compiler model
Takes as input:
intermediate representation (IR) code
symbol table information
Produces output:
semantically equivalent target program
Issues in the design of code generator
Input to the code generator
Target program
Memory management
Instruction selection
Register allocation
Choice of evaluation order
Approaches to code generation
Input to the Code Generator

The input to the code generator is


the intermediate representation of the source program produced by the
frontend along with information in the symbol table that is used to determine
the run-time address of the data objects denoted by the names in the IR.
Choices for the IR
Three-address representations: quadruples, triples, indirect triples
Virtual machine representations such as byte codes and stack-machine code
Linear representations such as postfix notation
Graphical representation such as syntax trees and DAGs
Assumptions
Relatively lower level IR
All syntactic and semantic errors are detected.

165
a : =b * - c + b * - c

Postfix notation
a b c uminus * b c uminus * + assign
Target Program
The output of code generator is target program.
Output may take variety of forms
Absolute machine language(executable code)
Relocatable machine language(object files for linker)
Assembly language(facilitates debugging)
Absolute machine language has advantage that it can be placed
in a fixed location in memory and immediately executed.
Relocatable machine language program allows subprograms to
be compiled separately.
Producing assembly language program as o/p makes the
process of code generation somewhat easier.
Memory Management
Mapping names in the source program to addresses of data
objects in run time memory is done by front end & code
generator.
If a machine code is being generated, labels in three address
statements have to be converted to addresses of instructions.
Instruction selection
Uniformity and completeness of the instruction set are
important factors.
Instruction speeds and machine idioms are another
important factor.
If we do not care about the efficiency of the target program,
instruction selection is straightforward.
The quality of the generated code is determined by its speed
and size.
Example

a=b+c
d=a+e

MOV b,R0
ADD c,R0
MOV R0,a
Redundant
MOV a, R0
ADD e,R0
MOV R0,d
Register Allocation
Instructions involving register operands are usually shorter
and faster than those involving operands in memory.
Two sub problems
Register allocation: select the set of variables that will reside in
registers at each point in the program.
Register assignment: select specific register that a variable will reside.
t=a+b
t=t*c
T=t/d

L R1, a
A R1, b
M R0, c
D R0, d
ST R1, t
Choice of evaluation order
The order in which computations are performed
can affect the efficiency of the target code.
When instructions are independent, their
evaluation order can be changed

MOV a,R0
ADD b,R0
MOV R0,t1
t1:=a+b MOV c,R1
t2:=c+d ADD d,R1
a+b-(c+d)*e MOV e,R0
t3:=e*t2
t4:=t1-t3 MUL R1,R0 MOV c,R0
MOV t1,R1 ADD d,R0
reorder SUB R0,R1 MOV e,R1
MOV R1,t4 MUL R0,R1
t2:=c+d MOV a,R0
t3:=e*t2 ADD b,R0
t1:=a+b SUB R1,R0
t4:=t1-t3 MOV R0,t4
Approaches to code generator
Criterion for a code generator is to produce correct code.
Given the premium on correctness, designing a code
generator so it can be easily implemented, tested, and
maintained is an important design goal.
A Code Generator
Generates target code for a sequence of three-address
statements.
Uses new function getreg to assign registers to
variables.
Computed results are kept in registers as long as
possible, which means:
Result is needed in another computation
Register is kept up to a procedure call or end of block
Checks if operands to three-address code are available
in registers
Register and Address Descriptors
A register descriptor keeps track of what is currently
stored in a register at a particular point in the code, e.g.
a local variable, argument, global variable, etc.
MOV a,R0 R0 contains a
An address descriptor keeps track of the location where
the current value of the name can be found at run time,
e.g. a register, stack location, memory address, etc.
MOV a,R0
MOV R0,R1 a in R0 and R1
The Code Generation Algorithm
For each statement x := y op z
1. Set location L = getreg(y, z)
2. If y L then generate
MOV y,L
where y denotes one of the locations where the value of y is
available
3. Generate
OP z,L
where z is one of the locations of z;
Update register/address descriptor of x to include L
4. If y and z has no next use and is stored in register, update
register descriptors to remove y and z
The getreg Algorithm
To compute getreg(y,z)
1. If y is stored in a register R and R only holds the value y, and
y has no next use, then return R;
Update address descriptor: value y no longer in R
2. Else, return a new empty register if available
3. Else, find an occupied register R;
Store contents (register spill) by generating
MOV R,M
for every M in address descriptor of y;
Return register R
4. If x is not used in the block, or no suitable occupied register
can be found, return memory location of x as L.
Code Generation Example
Register Address
Statements CodeGenerated
Descriptor Descriptor
Registersempty
t := a - b MOV a,R0 R0 contains t t in R0
SUB b,R0

u := a - c MOV a,R1 R0 contains t t in R0


SUB c,R1 R1 contains u u in R1

v := t + u ADD R1,R0 R0 contains v u in R1


R1 contains u v in R0
d := v + u ADD R1,R0
R0 contains d d in R0
MOV R0,d
d in R0and
memory
Intro to Global Data Flow Analysis
To apply global optimizations on basic blocks, a
compiler needs to collect data-flow information about
the program by solving systems of data-flow equations.
A typical equation has the form
out[S] = gen[S] (in[S] - kill[S])
Points and Paths
Point within a basic block:
A location between two consecutive statements.
A location before the first statement of the basic block.
A location after the last statement of the basic block.
Path: A path from a point p1 to pn is a sequence of points
p1, p2, pn such that for each i : 1 i n,
pi is a point immediately preceding a statement and pi+1 is the
point immediately following that statement in the same block,
or
pi is the last point of some block and pi+1 is first point in the
successor block.

181
Example: Paths and Points
d1: i := m 1
d2: j := n B1
d3: a := u1
Path:
p3
d4: i := i + 1 B2 p1, p2, p3, p4,
p4
p5, p6 pn
p5
p6
d5: j := j - 1 B3

B4

p1 pn
p2
d6: a := u2 B5 B6
182
Reaching Definition
Example: Reaching Definition
d1: i := m 1
d2: j := n B1
d3: a := u1
Definition of i (d1)
reaches p1
p1
p2
d4: i := i + 1 B2
Killed as d4, does
d5: j := j - 1 B3 not reach p2.

B4 Definition of i (d1)
does not reach B3,
d6: a := u2 B5 B6 B4, B5 and B6.
184
Data Flow Analysis of Structured
Programs
Data-flow equations for Reaching
Definitions

is of the form
S d: a:=b+c

Then, the data-flow equations for S are:

gen[S] = {d}
kill[S] = Da - {d}
out[S] = gen[S] (in[S] - kill[S])

where Da = all definitions of a in the region of code 187


S1
is of the form
S
S2

gen[S] = gen[S2] (gen[S1] - kill[S2])


kill[S] = kill[S2] (kill[S1] - gen[S2])
in[S1] = in[S]
in[S2] = out[S1]
188
is of the form
S S1 S2

gen[S] = gen[S1] gen[S2]


kill[S] = kill[S1] kill[S2]
in[S1] = in[S]
in[S2] = in[S]
189
is of the form
S S1

gen[S] = gen[S1]
kill[S] = kill[S1]
in[S1] = in[S] gen[S1]
out[S] = out[S1]
190
Example Reaching Definitions
d1: i := m-1;
d2: j := n;
gen={d3,d4,d5,d6,d7}
; d3: a := u1;
kill={d1,d2}
do
gen={d1,d2,d3} d4: i := i+1;
; d5: j := j-1;
kill={d4,d5,d6,d7}
if e1 then
gen={d1,d2} gen={d3} gen={d4,d5,d6,d7}do d6: a := u2
; d kill={d1,d2}
kill={d4,d5,d7} kill={d6} 3
else
d7: i := u3
gen={d1} gen={d2} gen={d4,d5,d6,d7}; e2
d d kill={d1,d2} while e2
kill={d4, d7} 1 kill={d5} 2

gen={d4,d5} gen={d6,d7}
; ifkill={}
kill={d1,d2,d7}

gen={d4} gen={d5} gen={d } gen={d }


d
kill={d1, d7} 4
d
kill={d2} 5
e1 d6kill={d }6 d7kill={d ,d
7
} 191
3 1 4
Representation of Sets
d1: i := m-1;
d2: j := n;
0011111
1100000 ; d3: a := u1;
do
1110000
d4: i := i+1;
0001111 ; d5: j := j-1;
if e1 then
1100000
; 0010000
d3 0001111 do d6: a := u2
0001101 0000010 1100000
else
0001111 d7: i := u3
1000000
0001001 d1
0100000
0000100 d2 1100000 ; e2 while e2

0001100 0000011
1100001 ; if 0000000

0001000 0000100 0000010 0000001


1000001 d4 0100000 d5 e1 d6 0010000 d7 1001000 192
d1: i := m-1;
d2: j := n;
d3: a := u1;
do
d4: i := i+1;
d5: j := j-1;
if e1 then
d6: a := u2
else
d7: i := u3
while e2
Accuracy, Safeness, and Conservative
Estimations
Conservative: refers to making safe assumptions when
insufficient information is available at compile time, i.e.
the compiler has to guarantee not to change the
meaning of the optimized code
Safe: refers to the fact that a superset of reaching
definitions is safe (some may be have been killed)
Accuracy: the larger the superset of reaching
definitions, the less information we have to apply code
optimizations
194
Reaching Definitions are a Conservative
(Safe) Estimation
Suppose this
branch is
S1 S2 never taken

Estimation:
gen[S] = gen[S1] gen[S2]
kill[S] = kill[S1] kill[S2]

Accurate:
gen[S] = gen[S1] gen[S]
195
kill[S] = kill[S ] kill[S]
Computations of in and out
Iterative Solution of Data-Flow
Equation
Pass 3
FINAL PASS
Definition - Use
Chains
Live Variable (Liveness) Analysis

Data flow equations:


in[ B ] use[ B ]U(out[ B] def [ B ])
out[ B] U
S succ ( B )
in[ S ]

1st Equation: a var is live, coming in the block, if


either
it is used before redefinition in B
or
it is live coming out of B and is not redefined in B
2nd Equation: a var is live coming out of B, iff it
is live coming in to one
203
of its successors.
Example: Liveness
r2, r3, r4, r5 are all live as they
r1 = r2 + r3 are consumed later, r6 is dead
r6 = r4 r5 as it is redefined later

r4 is dead, as it is redefined.
So is r6. r2, r3, r5 are live
r4 = 4
r6 = 8

r6 = r2 + r3
r7 = r4 r5 What does this mean?
r6 = r4 r5 is useless,
it produces a dead value !!
Get rid of it!
204
Extra Topic
Structure Preserving Transformation
1. Common Sub expression elimination
2. Dead code elimination
3. Renaming of temporary variables
4. Interchange of two independent adjacent statements.
Common-Sub expression Elimination
Remove redundant computations

a := b + c a := b + c
b := a - d b := a - d
c := b + c c := b + c
d := a - d d := b

t1 := b * c
t1 := b * c
t2 := a - t1
t2 := a - t1
t3 := b * c
t4 := t2 + t1
t4 := t2 + t3
207
Dead Code Elimination
Remove unused statements

b := a + 1 b := a + 1
a := b + c

Assuming a is dead (not used)
Renaming Temporary Variables
Temporary variables that are dead at the end of a block
can be safely renamed

t1 := b + c t1 := b + c
t2 := a - t1 t2 := a - t1
t1 := t1 * d t3 := t1 * d
d := t2 + t1 d := t2 + t3
Normal-form block
Interchange of Statements
Independent statements can be reordered

t1 := b + c t1 := b + c
t2 := a - t1 t3 := t1 * d
t3 := t1 * d t2 := a - t1
d := t2 + t3 d := t2 + t3
Algebraic Transformations
Change arithmetic operations computed by basic block
into algebraic equivalent forms

t1 := a - a t1 := 0
t2 := b + t1 t2 := b
t3 := 2 * t2 t3 := t2 << 1

You might also like