You are on page 1of 14

MAY/JUNE-'09/CS1352-Answer Key CS1352 Principles of Compiler Design University Question Key May/June 2009 PART-A 1.

. What are the issues to be considered in the design of lexical analyzer? Simpler design Compiler efficiency is improved Compiler portability is enhanced 2. Define concrete and abstract syntax with example. Abstract syntax tree is the tree in which node represents an operator and the children represents operands. Parse tree is called a concrete syntax tree, which shows how the start symbol of a grammar derives a string in the language. Abstract syntax tree, or simple syntax tree, differ from parse tree because superficial distinctions of form, unimportant for translation, do not appear in syntax tree. 3. Derive the string and construct a syntax tree for the input string ceaedbe using the grammar S->SaA|A, A->AbB|B, B->cSd|e. Derivation: S=> A (S->A) => AbB (A->AbB) => BbB (A->B) => cSdbB (B->cSd) => cSaAdbB (S->SaA) => cAaAdbB (S->A) => cBaAdbB (A->B) => ceaAdbB (B->e) =>ceaBdbB (A->B) =>ceaedbB (B->e) =>ceaedbe (B->e)

4. List the factors to be considered for top-down parsing. Top down parsing is an attempt to find a leftmost derivation for an input string. Left recursive grammar can cause a top-down parser to go into an indefinite loop on writing procedure. Backtracking overhead may occur Due to backtracking, it may reject some valid sentences

-1-

http://engineerportal.blogspot.in/

MAY/JUNE-'09/CS1352-Answer Key Left factoring Ambiguity The order in which alternates are tried can affect the language accepted When failure is reported, we have very little idea where the error actually occurred

5. Why is it necessary to generate intermediate code instead of generating target program itself? a. Retargeting can be facilitated: A Compiler for different machines can be created by attaching different back end to the existing front ends of each machine. b. A machine independent code optimizer can be applied to intermediate code in order to optimize the code generation. 6. Define back patching Back patching is the activity of filling up unspecified information of labels using appropriate semantic actions in during the code generation process. In the semantic actions the functions used are mklist(i), merge_list(p1,p2) and backpatch(p,i). Source: L2: x= y+1 if a or b then L3: if c then After Backpatching: x= y+1 100: if a goto 103 Translation: 101: if b goto 103 if a go to L1 102: goto 106 if b go to L1 103: if c goto 105 go to L3 104: goto 106 L1: if c goto L2 105: x=y+1 goto L3 106: 7. List the issues in code generation. Input to the code enerator Target programs Memory anagement Instruction selection Register allocation Choice of evaluation order Approaches to code generation. 8. Write the steps for constructing leaders in basic blocks. Leaders: The first statement of basic blocks. The first statement is a leader Any statement that is the target of a conditional or unconditional goto is a leader Any statement that immediately follows a goto or conditional goto statement is a leader. -2http://engineerportal.blogspot.in/

MAY/JUNE-'09/CS1352-Answer Key 9. What are the issues in static allocation? Here, names are bound to storage as the program is compiled, so there is no need for a run-time support package. The size of the data object and constraints on its position in memory must be known at compile time. Recursive procedures are restricted Data structures cannot be created dynamically. 10. What is meant by copy-restore? A hybrid between call-by-value and call-by-reference is copy-restore (also known as copy-in copy-out, or value-result). a. Before control flows to the called procedure, the actual parameters are evaluated. The r-values of the actuals are passed to the called procedure as in call-by-value. In addition, the l-values of those actual parameters having lvalues are determined before the call. b. When control returns, the current r-values of the formal parameters are copied back into the l-values of the actuals, using the l-values computed before the call. Only actuals having l-values are copied. PART B 11. a. i. Explain the need for dividing the compilation process into various phases and explain its functions. (8) The process of compilation is very complex. So it comes out to be customary from the logical as well as implementation point of view to partition the compilation process into several phases. A phase is a logically cohesive operation that takes as input one representation of source program and produces as output another representation. (2) Source program is a stream of characters: E.g. pos = init + rate * 60 (4) lexical analysis: groups characters into non-separable units, called token, and generates token stream: id1 = id2 + id3 * const The information about the identifiers must be stored somewhere (symbol table). Syntax analysis: checks whether the token stream meets the grammatical specification of the language and generates the syntax tree. Semantic analysis: checks whether the program has a meaning (e.g. if pos is a record and init and rate are integers then the assignment does not make a sense).
:=
id1 := + id2

id1 id2

+
*

*
id3 inttoreal 60

id3

60

Syntax analysis -3-

Semantic analysis

http://engineerportal.blogspot.in/

MAY/JUNE-'09/CS1352-Answer Key Intermediate code generation, intermediate code is something that is both close to the final machine code and easy to manipulate (for optimization). One example is the three-address code: dst = op1 op op2 The three-address code for the assignment statement: temp1 = inttoreal(60); temp2 = id3 * temp1; temp3 = id2 + temp2; id1 = temp3 Code optimization: produces better/semantically equivalent code. temp1 = id3 * 60.0 id1 = id2 + temp1 Code generation: generates assembly MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R2, R1 MOVF R1, id1 Symbol Table Creation / Maintenance Contains Info (storage, type, scope, args) on Each Meaningful Token, typically Identifiers Data Structure Created / Initialized During Lexical Analysis Utilized / Updated During Later Analysis & Synthesis Error Handling Detection of Different Errors Which Correspond to All Phases Each phase should know somehow to deal with error, so that compilation can proceed, to allow further errors to be detected
Source Program 1

Lexical Analyzer

Syntax Analyzer

3 Symbol-table Manager

Semantic Analyzer Error Handler

4 Intermediate Code Generator

Code Optimizer

Code Generator

Target Program

(2)

-4http://engineerportal.blogspot.in/

MAY/JUNE-'09/CS1352-Answer Key ii. Explain how abstract stack machines can be used as translators. (8) The front end of a compiler constructs an intermediate representation of source program from which the back end generates the target program. One popular form of intermediate representation is code for an abstract stack machine. Arithmetic instructions L-values and r-values stack manipulation translation of expressions control flow translation of statements emitting a translation (OR) b. What is syntax directed translation? How it is used for translation of expressions? Syntax directed translation Syntax directed translation scheme is a syntax directed definition in which the net effect of semantic actions is to print out a translation of the input to a desired output form. This is accomplished by including emit statements in semantic actions that write out text fragments of the output, as well as string-valued attributes that compute text fragments to be fed into emit statements. Syntax directed definition: It specifies the translation of a construct in terms of attributes associated with its syntactic components. It uses CFG to specify the syntactic structure of the input. With each grammar symbol, it associates a set of attributes and with each production, a set of semantic rules for computing the values of attributes associated with the symbols appearing in that production. Translation is an input-output mapping. Annotated parse tree Synthesized attributes depth-first traversals Translation schemes Emitting a translation 12. a. Given the following grammar S->AS|b, A->SA|a. Construct a SLR parsing table for the string baab. Given grammar: 1. S->AS 2. S->b. 3. A->SA 4. A->a Augmented grammar: S->S S->AS S->b A->SA A->a

-5http://engineerportal.blogspot.in/

MAY/JUNE-'09/CS1352-Answer Key I0: S->.S S->.AS S->.b A->.SA A->.a I1: goto(I0, S) S->S. A->S.A A->.SA A->.a S->.AS S->.b I2: goto(I0, A) S->A.S S->.AS S->.b A->.SA A->.a I3: goto(I0, b) S->b. I4: goto(I0, a) A->a. I5: goto(I1, A) A->SA. S->A.S S->.AS S->.b A->.SA A->.a I6: goto(I1, S) First(S)={b, a} First(A)={a, b} Follow(S)={$,a,b} Follow(A)={a,b} States 0 1 2 3 4 5 6 7 a S4 S4 S4 r2 r4 r3 s4 S4 S4 r1 Action b S3 S3 S3 r2 r4 r3 s3 S3 S3 r1 Goto $ acc r2 7 6 r1 6 2 5 5 S 1 6 7 A 2 5 2 A->S.A A->.SA A->.a S->.AS S->.b goto(I1, a)=I4 goto(I1, b=I3 I7: goto(I2, S) S->AS. A->S.A A->.SA A->.a S->.AS S->.b goto(I2, A)=I2 goto(I2, b)=I3 goto(I2, a)=I4 goto(I5, A)=I2 goto(I5, S)=I7 goto(I5, a)=I4 goto(I5,b)=I3 goto(I6, A)=I5 goto(I6, S)=I6 goto(I6, a)=I4 goto(I6,b)=I3 goto(I7, A)=I5 goto(I7, S)=I6 goto(I7, a)=I4 goto(I7,b)=I3

-6http://engineerportal.blogspot.in/

MAY/JUNE-'09/CS1352-Answer Key

Parsing the string baab: 0 0b3 0S1 0S1a4 0S1A5 0A2 0A2a4 0A2A2 0A2A2b3 0A2A2S7 0A2S7 0S1 baab$ aab$ aab$ ab$ ab$ ab$ b$ b$ $ $ $ $ shift 3 reduce by S->b shift 4 reduce by A->a reduce by A->SA shift 4 reduce by A->a shift 3 reduce by S->b reduce by S->AS reduce by S->AS accept (OR) b. Consider the grammar E->E+T | T, T->T*F | F, F->(E)|id. Using predictive parsing, parse the string id+id*id. Eliminating left recursion: (2) E->TE E->+TE | T->FT T->*FT | F-> (E) | id Calculation of First: (2) First (E) = First (T) = First (F) = {(, id} First (E) = {+, } First (T) = {*, } Calculation of Follow: (2) Follow (E) = Follow (E) = {), $} Follow (T) = Follow (T) = {+,), $} Follow (F) = {+, *,), $} Predictive parsing table:(5) Non terminal id + E E->TE E E->+TE T T->FT T T-> F F->id

Input Symbol * ( E->TE T->FT T->*FT F->(E)

) E-> T->

$ E-> T->

-7http://engineerportal.blogspot.in/

MAY/JUNE-'09/CS1352-Answer Key

Moves made by predictive parser on id + id*id: (5) Stack $E $ET $ETF $ETid $ET $E $ET+ $ET $ETF $ETid $ET $ETF* $ETF $ETid $ET $E $ Input id+id*id$ id+id*id$ id+id*id$ id+id*id$ +id*id$ +id*id$ +id*id$ id*id$ id*id$ id*id$ *id$ *id$ id$ id$ $ $ $ Output E->TE T->FT F->id T-> E->+TE T->FT F->id T->*FT F->id T-> E->

13. a. Explain in detail how three address codes are generated and implemented. It is one of the intermediate representations. It is a sequence of statements of the form x:= y op z, where x, y, and z are names, constants or compilergenerated temporaries and op is an operator which can be arithmetic or a logical operator. E.g. x+y*z is translated as t1=y*z and t2=x+t1. (4) Reason for the term three-address code is that each statement usually contains three addresses, two for the operands and one for the result. (2) Common three address statements: (4) x:=y op z (assignment statements) x:= op y (assignment statements) x:=y (copy statements) goto L (unconditional jump) Conditional jumps like if x relop y goto L param x, call p,n and return y for procedure calls indexed assignments x:=y[i] and x[i]:= y address and pointer assignments x:=&y, x:=*y and *x:=y Implementation: (6) Quadruples Record with four fields, op, arg1, arg2 and result Triples Record with three fields, op, arg1, arg2 to avoid entering temporary names into symbol table. Here, refer the temporary value by the position of the statement that computes it.

-8http://engineerportal.blogspot.in/

MAY/JUNE-'09/CS1352-Answer Key Indirect triples List the pointers to triples rather than listing the triples For a: = b* -c + b * -c Quadruples Op (0) uminus (1) * (2) uminus (3) * (4) + (5) := Triples Op (0) uminus (1) * (2) uminus (3) * (4) + (5) assign Indirect Triples Op (14) uminus (15) * (16) uminus (17) * (18) + (19) assign

arg1 c b c b t2 t5 arg1 c b c b (1) a arg1 c b c b (15) a

arg2 t1 t3 t4

result t1 t2 t3 t4 t5 a

arg2 (0) (2) (3) (4) arg2 (14) (16) (17) (18) (0) (1) (2) (3) (4) (5) Statement (14) (15) (16) (17) (18) (19)

(OR) b. Explain the role of declaration statements in intermediate code generation. When a sequence of declarations in a procedure or block is examined, layout the storage for names local to the procedures. Dealing with declarations in Procedures: P procedure id ; block ; Semantic Rule (2) begin = newlabel; Enter into symbol-table in the entry of the procedure name the begin label. P.code = gen(begin :) || block.code || gen(pop return_address) || gen(goto return_address) S call id Semantic Rule Look up symbol table to find procedure name. Find its begin label called proc_begin return = newlabel; S.code = gen(pushreturn); gen(goto proc_begin) || gen(return :) Using a global variable offset

-9http://engineerportal.blogspot.in/

MAY/JUNE-'09/CS1352-Answer Key

Computing the types and relative addresses of declared names: P M D {} M {offset:=0 } D id : T {enter(id.name, T.type, offset) offset:=offset + T.width } T real {T.type = real; T.width = 8; } T integer {T.type = integer ; T.width = 4; } T array [ num ] of T1 {T.type=array(1..num.val,T1.type) T.width = num.val * T1.width} T T1 {T.type = pointer(T1.type); T1.width = 4}

(4)

Keeping Track of Scope Information (4) Nested Procedure Declarations For each procedure we should create a symbol table. mktable(previous) create a new symbol table where previous is the parent symbol table of this new symbol table and returns a pointer to the new table enter(symtable,name,type,offset) create a new entry for a variable in the given symbol table. enterproc(symtable,name,newsymbtable) create a new entry for the procedure in the symbol table of its parent. addwidth(symtable,width) puts the total width of all entries in the symbol table into the header of that table. We will have two stacks: tblptr to hold the pointers to the symbol tables of enclosing procedures offset to hold the current offsets in the symbol tables in tblptr stack. Top element is the next available relative address for a local of the current procedure. Processing declarations in nested procedures (4) P M D { addwidth(top(tblptr), top(offset)); pop(tblptr); pop(offset) } M { t:=mktable(null); push(t, tblptr); push(0, offset)} D D1 ; D2 ... D proc id ; N D ; S { t:=top(tblpr); addwidth(t,top(offset)); pop(tblptr); pop(offset); enterproc(top(tblptr), id.name, t)} N {t:=mktable(top(tblptr)); push(t,tblptr); push(0,offset);} D id : T {enter(top(tblptr), id.name, T.type, top(offset); top(offset):=top(offset) + T.width Field names in records (2) {T.type:=record(top(tblptr)); T.width:=top(offset); T-> record L D end pop(tblptr); pop(offset);} {t:=mktable(nil); push(t, tblptr); push(0, offset);} L-> - 10 http://engineerportal.blogspot.in/

MAY/JUNE-'09/CS1352-Answer Key

14. a. Design a simple code generator and explain with example. It generates target code for a sequence of three address statements. (2) Assumptions: For each operator in three address statement, there is a corresponding target language operator. Computed results can be left in registers as long as possible. E.g. a=b+c: (4) Add Rj,Ri where Ri has b and Rj has c and result in Ri. Cost=1; Add c, Ri where Ri has b and result in Ri. Cost=2; Mov c, Rj; Add Rj, Ri; Cost=3; Register descriptor: Keeps track of what is currently in each register Address descriptor: Keeps tracks of the location where the current value of the name can be found at run time. (2) Code generation algorithm: For x= y op z (6) Invoke the function getreg to determine the location L, where the result of y op z should be stored (register or memory location) Check the address descriptor for y to determine y Generate the instruction op z, L where z is the current location of z If the current values of y and/or z have no next uses, alter register descriptor Getreg: (2) If y is in a register that holds the values of no other names and y is not live, return register of y for L If failed, return empty register If failed, if X has next use, find an occupied register and empty it If X is not used in the block, or suitable register is found, select memory location of x as L (OR) b. Write short notes on: i. Peep hole optimization Peephole optimization is a simple and effective technique for locally improving target code. This technique is applied to improve the performance of the target program by examining the short sequence of target instructions and replacing these instructions by shorter or faster sequence, whenever is possible. Peep hole is a small, moving window on the target program. Local in nature Pattern driven Limited by the size of the window Characteristics of peephole optimization: Redundant instruction elimination Flow of control optimization Algebraic simplification Use of machine idioms Constant Folding x := 32 x := x + 32 becomes x := 64

- 11 http://engineerportal.blogspot.in/

MAY/JUNE-'09/CS1352-Answer Key Unreachable Code An unlabeled instruction immediately following an unconditional jump is removed. goto L2 x := x + 1 unneeded Flow of control optimizations Unnecessary jumps are eliminated. goto L1 L1: goto L2 becomes goto L2 Algebraic Simplification x := x + 0 unneeded Dead code elimination x := 32 where x not used after statement y := x + y y := y + 32 Reduction in strength Replace expensive operations by equivalent cheaper ones x := x * 2 x := x + x ii: Issues in code generation Input to the code generator Intermediate representation of the source program, like linear representations such as postfix notation, three address representations such as quadruples, virtual machine representations such as stack machine code and graphical representations such as syntax trees and dags. Target programs It is the output such as absolute machine language, relocatable machine language or assembly language. Memory management Mapping of names in the source program to addresses of data object in run time memory is done by front end and the code generator. Instruction selection Nature of the instruction set of the target machine determines the difficulty of instruction selection. Register allocation Instructions involving registers are shorter and faster. The use of registers is being divided into two sub problems: o During register allocation, we select the set of variables that will reside in registers at a point in the program o During a subsequent register assignment phase, we pick the specific register that a variable will reside in Choice of evaluation order The order in which computations are performed affect the efficiency of target code. Approaches to code generation

- 12 http://engineerportal.blogspot.in/

MAY/JUNE-'09/CS1352-Answer Key 15. a. Explain with an example how basic blocks are optimized. Code improving transformations: Structure-preserving transformations o Common sub expression elimination o Dead-code eliminations Algebraic transformations like reduction in strength. Structure preserving transformations: (8) It is implemented by constructing a dag for a basic block. Common sub expression can be detected by noticing, as a new node m is about to be added, whether there is an existing node n with the same children, in the same order, and with the same operator. If so, n computes the same value as m and may be used in its place. E.g. DAG for the basic block d:=b*c e:=a+b b:=b*c a:=e-d is given by

For dead-code elimination, delete from a dag any root (root with no ancestors) that has no live variables. Repeated application of this will remove all nodes from the dag that corresponds to dead code. Use of algebraic identities: (8) e.g. x+0 = 0+x=x x-0 = x x*1 = 1*x = x x/1 = x Reduction in strength: Replace expensive operator by a cheaper one. x ** 2 = x * x Constant folding: Evaluate constant expressions at compile time and replace them by their values. Can use commutative and associative laws E.g. a=b+c e=c+d+b IC: a=b+c t=c+d e=t+b - 13 http://engineerportal.blogspot.in/

MAY/JUNE-'09/CS1352-Answer Key If t is not needed outside the block, change this to a=b+c e=a+d using both the associativity and commutativity of +. (OR) b. Explain the storage allocation strategies used in run time environments. Static allocation lays out storage for all data objects during compile time Stack allocation manages the run-time storage as a stack Heap allocation allocates and deallocates storages as needed at runtime from heap area Static allocation: (4) Names are bound to storage at compile time No need for run-time support package When a procedure is activated, its names are bound to same storage location. Compiler must decide where activation records should go. Limitations: size must be known at compile time recursive procedures are restricted data structures cant be created dynamically Stack allocation: (6) Activation records are pushed and popped as activations begin and end. Locals are bound to fresh storage in each activation and deleted when the activation ends. Call sequence and return sequence caller and callee Dangling references Heap allocation: (6) Stack allocation cannot be used if either of the following is possible: 1. The values of local names must be retained when an activation ends 2. A called activation outlives the caller. Allocate pieces of memory for activation records, which can be deallocated in any order Maintain linked list of free blocks Fill a request for size s with a block of size s, where s is the smallest size greater than or equal to s Use heap manager, which takes care of defragmentation and garbage collection.

- 14 http://engineerportal.blogspot.in/

You might also like