You are on page 1of 19

MAY/JUNE-'07/CS1352-Answer Key CS1352 Principles of Compiler Design University Question Key May/June 2007 PART-A 1. Define a preprocessor.

. Produce input to compilers. Functions: Macro processing, file inclusion, rational preprocessors and language extensions. 2. What are the issues in lexical analysis? Simpler design Compiler efficiency is improved Compiler portability is enhanced. 3. Eliminate the left recursion from the following grammar A->Ac | Aad | bd | c The rule to eliminate the left recursion is A->A | can be converted as A-> A and A-> A | . So, the grammar after eliminating left recursion is A->bdA | cA; A->cA | adA | 4. What are the disadvantages of operator precedence parsing? The operator, like minus (unary and binary) has two different precedence. Hence it is hard to handle tokens like minus sign. This kind of parsing is applicable to only small class of grammars. 5. Write the properties of intermediate language. Intermediate codes are machine independent codes, but they are close to machine instructions. The given program in a source language is converted to an equivalent program in an intermediate language by the intermediate code generator. Intermediate language can be many different languages, and the designer of the compiler decides this intermediate language. syntax trees can be used as an intermediate language. postfix notation can be used as an intermediate language. three-address code (Quadruples, triples and indirect triples) can be used as an intermediate language 6. What is back patching? Back patching is the activity of filling up unspecified information of labels using appropriate semantic actions in during the code generation process. In the semantic actions the functions used are mklist(i),merge_list(p1,p2) and backpatch(p,i).

-1-

http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key 7. What are the applications of DAG? Determining the common sub-expressions. Determining which identifiers have their values used in the block Determining which statements of the block compute value outside the block. 8. Give the primary structure preserving transformations on Basic Blocks. Common sub expression elimination Dead-code elimination Renaming of temporary variables Interchange of two independent adjacent statements 9. What do you mean by code motion? It decreases the amount of code in a loop. Taking the expression which yield the same result independent of the number of times a loop is executed (a loopinvariant computation and places it before the loop. 10. Draw the diagram of the general activation record and give the purpose of any two fields. Returned value Actual parameters Optional control link Optional access link Saved machine status Local data temporaries Temporaries are used to hold values that arise in the evaluation of expressions. Returned value field is used by the called procedure to return a value to the calling procedure PART B 11. a. i. Write about the phases of compiler and by assuming an input and show the output of various phases. (10) The process of compilation is very complex. So it comes out to be customary from the logical as well as implementation point of view to partition the compilation process into several phases. A phase is a logically cohesive operation that takes as input one representation of source program and produces as output another representation. (2) Source program is a stream of characters: E.g. pos = init + rate * 60 (6) lexical analysis: groups characters into non-separable units, called token, and generates token stream: id1 = id2 + id3 * const The information about the identifiers must be stored somewhere (symbol table). Syntax analysis: checks whether the token stream meets the grammatical specification of the language and generates the syntax tree.

-2http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key

Semantic analysis: checks whether the program has a meaning (e.g. if pos is a record and init and rate are integers then the assignment does not make a sense).
:=

:=
id1 + id2

id1 id2

+
*

*
id3 inttoreal 60

id3

60

Syntax analysis

Semantic analysis

Intermediate code generation, intermediate code is something that is both close to the final machine code and easy to manipulate (for optimization). One example is the threeaddress code: dst = op1 op op2 The three-address code for the assignment statement: temp1 = inttoreal(60); temp2 = id3 * temp1; temp3 = id2 + temp2; id1 = temp3 Code optimization: produces better/semantically equivalent code. temp1 = id3 * 60.0 id1 = id2 + temp1 Code generation: generates assembly MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R2, R1 MOVF R1, id1

Symbol Table Creation / Maintenance Contains Info (storage, type, scope, args) on Each Meaningful Token, typically Identifiers Data Structure Created / Initialized During Lexical Analysis Utilized / Updated During Later Analysis & Synthesis Error Handling Detection of Different Errors Which Correspond to All Phases Each phase should know somehow to deal with error, so that compilation can proceed, to allow further errors to be detected

-3http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key

Source Program 1 Lexical Analyzer

Syntax Analyzer

3 Symbol-table Manager

Semantic Analyzer Error Handler

Intermediate Code Generator

Code Optimizer

Code Generator

Target Program

(2)

ii. Explain briefly about compiler construction tools. (6) Parser Generators : Produce Syntax Analyzers Scanner Generators : Produce Lexical Analyzers Syntax-directed Translation Engines : Generate Intermediate Code Automatic Code Generators : Generate Actual Code Data-Flow Engines : Support Optimization

(OR) b. i. Construct the NFA from the (a|b)*a(a|b) using Thompsons construction algorithm. (10) The algorithm is syntax directed in that it uses the syntactic structure of the regular expression to guide the construction process. First, parse the regular expression r into its constituent sub expressions. Then using various rules, construct NFAs for each of the basic symbols in r.

-4http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key

-5http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key

ii. Explain about Input buffering technique. (6) Determining the next lexeme requires reading the input beyond the end of the lexeme. Buffer Pairs: (2) Concerns with efficiency issues Used with a look ahead on the input It is a specialized buffering technique used to reduce the overhead required to process an input character. Buffer is divided into two N-character halves. Use two pointers. Used at times when the lexical analyzer needs to look ahead several characters beyond the lexeme for a pattern before a match is announced. One pointer called forward pointer, points to first character of the next lexeme found. The string of characters between two forms the lexeme.

-6http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key Increment procedure for forward pointer: (2) If forward at end of first half then reload second half forward+=1 else if forward at end of second half reload the first half move forward to beginning of first half else forward+=1 Sentinels: (2) It is the special character which cannot be a part of source program. It is used to reduce the two tests into one. e.g. eof Increment procedure for forward pointer using sentinels: forward+=1 if forward =eof then If forward at end of first half then reload second half forward+=1 else if forward at end of second half reload the first half move forward to beginning of first half else terminate lexical analysis 12. a. i. Construct predictive parsing table for the grammar (10) S->(L) | a L->L,S | S After the elimination of left recursion: (2) S->(L) | a L->SL L->,SL | Calculation of First: (2) First(S) = {(, a} First(L) = {(, a} First(L) = {, , } Calculation of Follow: (2) Follow(S) = {$, , ,)} Follow (L) = {)} Follow (L) = {)} Predictive parsing table: (4) Non Input symbol terminals a ( ) , S S->a S->(L) L L->SL L->SL L L-> L->,SL

-7http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key

ii. What are the different strategies that a parser can employ to recover from syntax errors? (6) Panic mode recovery On discovering an error, the parser discards input symbols one at a time until one of a designated set of synchronizing tokens is found. Phrase level recovery On discovering an error, the parser may perform local correction on the remaining input; e.g. replace prefix of the remaining input by some string that allow the parser to continue Error productions Augment the grammar for the language with productions that generate the erroneous constructs. If it is being used by the parser, generate appropriate error diagnostics to indicate the erroneous construct that has been recognized in the input Global correction It does minimal changes in the incorrect input string to obtain a globally least-cost correction. (OR) b. i. Construct the CLR parsing table from S->AA, A->Aa | b Augmented grammar: S->S S->AA A->Aa A->b I0: S->.S, $ S->.AA, $ A->.Aa, b A->.b, b I1: goto(I0, S) S->S., $ I2: goto(I0, A) S->A.A,$ ` Parsing table: States 0 1 2 3 4 5 Action b s3 r2, r3, s3 r3 r2 (10) A->A.a, b A->.Aa, b A->.b, b I3: goto(I0, b) A->b., b I4: goto(I2, A) S->AA., $ A->Aa., b A->A.a, b I5: goto(I4, a) A->Aa., b goto(I2, a)=I5 goto(I2, b)=I3

Goto $ acc S 1 A 2 4 r3

r2, r3, s5 r3 s5 r2

-8http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key ii. Write Operator-precedence parsing algorithm. (6) set ip to point to the first symbol of w$; repeat if $ is on top of the stack and ip points to $ then return else let a be the topmost terminal symbol on the stack and let b be the symbol pointed to by ip; if a<.b or a=b then push b onto stack; advance ip to the next input symbol; else if a.>b then repeat pop the stack until the top stack terminal is related by <. to the terminal most recently popped else error() end end 13. a. i. Write about implementation of three addressing statements. (8) It is one of the intermediate representations. It is a sequence of statements of the form x:= y op z, where x, y, and z are names, constants or compiler-generated temporaries and op is an operator which can be arithmetic or a logical operator. E.g. x+y*z is translated as t1=y*z and t2=x+t1. (4) Reason for the term three-address code is that each statement usually contains three addresses, two for the operands and one for the result. (2) Implementation: Quadruples Record with four fields, op, arg1, arg2 and result Triples Record with three fields, op, arg1, arg2 to avoid entering temporary names into symbol table. Here, refer the temporary value by the position of the statement that computes it. Indirect triples List the pointers to triples rather than listing the triples For a: = b* -c + b * -c Quadruples Op arg1 arg2 result (0) uminus c t1 (1) * b t1 t2 (2) uminus c t3 (3) * b t3 t4 (4) + t2 t4 t5 (5) := t5 a

-9http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key

Triples Op (0) uminus (1) * (2) uminus (3) * (4) + (5) assign Indirect Triples Op (14) uminus (15) * (16) uminus (17) * (18) + (19) assign arg1 arg2 c b (0) c b (2) (1) (3) a (4) arg1 arg2 c b (14) c b (16) (15) (17) a (18) Statement (0) (14) (1) (15) (2) (16) (3) (17) (4) (18) (5) (19)

ii. Give the syntax-directed definition for flow of control statements. (8) Flow of control statements: S-> if E then S1 | if E then S1 else S2 | while E do S1 If-statement: (4) Semantic rules for if E then S1: E.true:= newlabel; E.false:=S.next; S1.next:=S.next; S.code:=E.code || gen(E.true :) || S1.code Semantic rules for if E then S1 else S2: E.true:= newlabel; E.false:=newlabel; S1.next:=S.next; S2.next:=S.next; S.code:=E.code || gen(E.true :) || S1.code || gen(goto S.next) || gen(E.false :) || S2.code Example: Statement: a and b and c if a were false, then we need not evaluate the rest of the expressions. So, we insert labels E.true and E.false in the appropriate places. if a goto E.true goto E.false E.true: if b goto E1.true goto E.false E1.true: if c goto E2.true goto E.false E2.true : exp =1 E.false: exp =0

- 10 http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key

Semantic rules for while E do S1: (4) S.begin:=newlabel E.true:= newlabel; E.false:=S.next; S1.next:=S.begin; S.code:=gen(S.begin:) || E.code || gen(E.true :) || S1.code || gen(goto S.begin) Example: while a<b do if c<d then x=y+z else x=y-z 3AC generated is L1: if a<b goto L2 goto Lnext L2: if c<d goto L3 goto L4 L3: t1:=y+z x:=t1 goto L1 L4: t4:=y-z x:=t2 goto L1 Lnext: (OR) b. i. How back patching can be used to generate code for Boolean expressions and flow of control statements. (10) Back patching is the activity of filling up unspecified information of labels using appropriate semantic actions in during the code generation process. In the semantic actions the functions used are mklist(i), merge_list(p1,p2) and backpatch(p,i). (2) Boolean expressions: (4) Consider the following grammar: E E1 or ME2 E E1 and ME2 E not E1 E (E1) E id1 relop id2 E false E true M Here, the synthesized attributes truelist and falselist of nonterminal E are used to generate jumping code for Boolean expressions.

- 11 http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key

The corresponding semantic rules are given by: E E1 or M E2 {Backpatch( E1.falselist,M.quad); E.truelist = merge (E2.truelist, E1.truelist); E.falselist=E2.falselist; } E E1 and M E2 {Backpatch(E1.truelist,M.quad); E.falselist = merge (E1.falselist, E2.falselist); E.truelist=E2.truelist} E not E1 { E.truelist = E1.falselist; E.falselist = E1.truelist; } E (E1) { E.truelist = E1.truelist;E.falselist=E1.falselist} E id1 relop id2 {E.truelist = makelist(nextquad); E.falselist = makelist(nextquad +1); emit( if id1.place relop.op id2.place goto ____); emit( goto ____);} E false { E.falselist = makelist(nextquad); emit( goto ____) } E true { E.truelist = makelist(nextquad); emit( goto ____) } M { M.quad = nextquad } Example : Consider the string: a<b or c<d and e<f (Assuming that the grammar is left associative) The corresponding intermediate code is: 100: if ( a<b ) goto --101: goto ---102: L1: if ( c<d ) goto --103: goto --104: L2: if ( e<f ) goto --105: goto --The code after backpatching becomes: 100: if ( a<b ) goto ____ 101: goto 102 102: if ( c<d ) goto 104 103: goto ____ 104 if ( e<f ) goto ____ 105: goto ____ Flow-of-control statements: (4) Consider the following grammar: S-> if E then S | if E then S else S | while E do S | begin L end | A L->L;S | S Semantic rules: S-> if E then M1 S1 N else M2 S2 {backpatch(E.truelist, M1.quad); backpatch (E.falselist, M2.quad);S.nextlist =merge(S1.nextlist, merge(N.nextlist, S2.nextlist))} N->{N.nextlist:=makelist(nextquad); emit(goto );} M-> {M.quad=nextquad} S->if E then M S1 {backpatch(E.truelist, M.quad); S.nextlist:= merge(E.falselist, S1.nextlist)} S->while M1 E do M2 S {backpatch(S1.nextlist, M1.quad); backpatch(E.truelist, M2.quad); S.nextlist:=E.falselist; emit(goto M1.quad)}

- 12 http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key S->begin L end {S.nextlist:=L.nextlist} S->A {S.nextlist:=nil} L->L1;M S {backpatch(L1.nextlist, M.quad); L.nextlist:= S.nextlist} L->S { L.nextlist:= S.nextlist} Here, fill in the jumps out of statements when their targets are found. Not only do Boolean expressions need two lists of jumps that occur when the expression is true and when it is false, but statements also need list of jumps (given by attribute nextlist) to the code that follows them in the execution sequence. ii. Write short notes on procedure calls. (6) Procedure is an important and frequently used programming construct that is imperative for a compiler to generate good code for procedure calls and returns. (2) Consider the following grammar for a simple procedure call statement: S-> call id (Elist) Elist -> Elist, E Elist ->E Calling sequences: (2) The translation for a call includes a calling sequence, a sequence of actions taken on entry to and exit from each procedure. Example: (2) Syntax directed translation: S-> call id(Elist) {for each item p on queue do Emit(param p); Emit(call id.place)} Elist -> Elist, E {append E.place to the end of the queue} Elist - > E {initialize queue to contain only E.place} E.g. Call p1(int a, int b) param a param b call p1 14. a. i. Write in detail about the issues in the design of a code generator. (10) Input to the code generator Intermediate representation of the source program, like linear representations such as postfix notation, three address representations such as quadruples, virtual machine representations such as stack machine code and graphical representations such as syntax trees and dags. Target programs It is the output such as absolute machine language, relocatable machine language or assembly language. Memory management Mapping of names in the source program to addresses of data object in run time memory is done by front end and the code generator.

- 13 http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key Instruction selection Nature of the instruction set of the target machine determines the difficulty of instruction selection. Register allocation Instructions involving registers are shorter and faster. The use of registers is being divided into two sub problems: o During register allocation, we select the set of variables that will reside in registers at a point in the program o During a subsequent register assignment phase, we pick the specific register that a variable will reside in Choice of evaluation order The order in which computations are performed affect the efficiency of target code. Approaches to code generation ii. What are steps needed to compute the next use information? (6) If the name in a register is no longer needed, then the register can be assigned to some other name. This idea of keeping a name in storage only if it will be used subsequently can be applied in a number of contexts. Computing next uses: (2) The use of a name in a three-address statement is defined as follows: Suppose a three-address statement i assigns a value to x. If statement j has x as an operand and control can flow from statement i to j along a path that has no intervening assignments to x, then we say statement j uses the value of x computed at i. Example: x:=i j:=x op y // j uses the value of x Algorithm to determine next use: (2) The algorithm to determine next uses makes a backward pass over each basic block, recording for each name x whether x has a next use in the block and if not, whether it is live on exit from the block (using data flow analysis). Suppose we reach three-address statement i: x: =y op z in our backward scan. Then do the following: Attach to statement i, the information currently found in the symbol table regarding the next use and the liveness of x, y, and z. In the symbol table, set x to not live and no next use In the symbol table, set y and z to live and the next uses of y and z to i. (OR) b. i. Discuss briefly about DAG representation of basic blocks. (10) A DAG for a basic block is a directed acyclic graph in which (2) leaves are labeled by unique ids, either variable names or constants Interior nodes are operators Nodes are also given a sequence of ids for labels to store the computed values. It is useful for implementing transformations on basic blocks and shows how values computed by a statement are used in subsequent statements.

- 14 http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key e.g Dag is


[] t1 t2

t1:=4*i t2:=a[t1]

(2)

* i

Algorithm for the construction of DAG: (4) Input: A basic block Output: DAG for that basic block, having Label for each node where leaves are identifiers, interior nodes are operator symbol. for each node, a list of identifiers to hold computed values 1) x = y op z 2) x = op y 3) x = y Step 1: If node(y) is undefined, create a leaf labeled y and let node(y) be this node. In 1), if node(z) is undefined, create a leaf labeled z and let that leaf be node(z) Step 2: For 1), create node op with left child y and right child z, after checking for common sub expression For 2), check for a node op with a child y. If not create such node For 3), let n be node y. Step 3: Delete x from the list of identifiers for node x. Append x to the list of attached identifiers for node n found in step 2 and set node x to n Applications of DAG: (2) Determining the common sub-expressions. Determining which identifiers have their values used in the block Determining which statements compute values that could be used outside the block Simplifying the list of quadruples by eliminating the common sub-expressions and not performing the assignment of the form x: = y unless and until it is a must. ii. Explain the characteristics of peephole optimization (6) Peephole optimization is a simple and effective technique for locally improving target code. This technique is applied to improve the performance of the target program by examining the short sequence of target instructions and replacing these instructions by shorter or faster sequence, whenever is possible. Peep hole is a small, moving window on the target program. Local in nature Pattern driven Limited by the size of the window Characteristics of peephole optimization: Redundant instruction elimination Flow of control optimization Algebraic simplification Use of machine idioms

- 15 http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key Constant Folding x := 32 x := x + 32 becomes x := 64 Unreachable Code An unlabeled instruction immediately following an unconditional jump is removed. goto L2 x := x + 1 unneeded Flow of control optimizations Unnecessary jumps are eliminated. goto L1 L1: goto L2 becomes goto L2 Algebraic Simplification x := x + 0 unneeded Dead code elimination x := 32 where x not used after statement y := x + y y := y + 32 Reduction in strength Replace expensive operations by equivalent cheaper ones x := x * 2 x := x + x 15. a. i. Describe the principal sources of optimization. (8) Code optimization is needed to make the code run faster or take less space or both. Function preserving transformations: Common sub expression elimination Copy propagation Dead-code elimination Constant folding Common sub expression elimination: (2) E is called as a common sub expression if E was previously computed and the values of variables in E have not changed since the previous computation. Copy propagation: (2) Assignments of the form f:=g is called copy statements or copies in short. The idea here is use g for f wherever possible after the copy statement. Dead code elimination: (2) A variable is live at a point in the program if its value can be used subsequently. Otherwise dead. Deducing at compile time that the value of an expression is a constant and using the constant instead is called constant folding. Loop optimization: (2) Code motion: Moving code outside the loop Takes an expression that yields the same result independent of the number of times a loop is executed (a loop-invariant computation) and place the expression before the loop. Induction variable elimination Reduction in strength: Replacing an expensive operation by a cheaper one.

- 16 http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key ii. Write about Data flow analysis of structural programs. (8) Flow graphs for control-flow constructs such as do while statements have a useful property; there is a single beginning point at which control enters and a single end point that control leaves from when execution of the statement is over. Some structured control constructs:

Define a portion of a flow graph called a region to be a set of nodes N that includes a header, which dominates all other nodes in the region. All edges between nodes in N are in the region, except for some that enter the header. The portion of a flow graph corresponding to a statement S is a region that obeys the further restriction that control can flow to just one outside block when it leaves the region. gen[S] is the set of definitions generated by S. kill[S] be the set of definitions that never reach the end of S, even if they reach the beginning. Both are synthesized attributes; they are computed bottom-up, from the smallest statements to the largest. Data-flow equations for reaching definitions:

- 17 http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key

(OR) b. i. What are the different storage allocation strategies? Explain. (10) Strategies: (2) Static allocation lays out storage for all data objects during compile time Stack allocation manages the run-time storage as a stack Heap allocation allocates and deallocates storages as needed at runtime from heap area Static allocation: (2) Names are bound to storage at compile time No need for run-time support package When a procedure is activated, its names are bound to same storage location. Compiler must decide where activation records should go. Limitations: size must be known at compile time recursive procedures are restricted data structures cant be created dynamically Stack allocation: (3) Activation records are pushed and popped as activations begin and end. Locals are bound to fresh storage in each activation and deleted when activation ends. Call sequence and return sequence caller and callee Dangling references

the

Heap allocation: (3) Stack allocation cannot be used if either of the following is possible: 1. The values of local names must be retained when an activation ends 2. A called activation outlives the caller. Allocate pieces of memory for activation records, which can be deallocated in any order Maintain linked list of free blocks Fill a request for size s with a block of size s, where s is the smallest size greater than or equal to s Use heap manager, which takes care of defragmentation and garbage collection. ii. Write short notes on parameter parsing. (6) Call by value A formal parameter is treated just like a local name. Its storage is in the activation record of the called procedure The caller evaluates the actual parameter and place the r-value in the storage for the formals

- 18 http://engineerportal.blogspot.in/

MAY/JUNE-'07/CS1352-Answer Key Call by reference If an actual parameter is a name or expression having L-value, then that lvalue itself is passed However, if it is not (e.g. a+b or 2) that has no l-value, then expression is evaluated in the new location and its address is passed. Copy-Restore: Hybrid between call-by-value and call-by-ref (copy in, copy out) Actual parameters evaluated, its r-value is passed and l-value of the actuals are determined When the called procedure is done, r-value of the formals are copied back to the l-value of the actuals Call by name Inline expansion(procedures are treated like a macro)

- 19 http://engineerportal.blogspot.in/

You might also like