You are on page 1of 12

Eliminatingleftrecursion(informally)

Directleftrec.

ForeachA>A1|...|An|1|...|n Rewrite:A>1A'|...|nA' Introduce:A'>1A'|...|nA'| A>BandB>Ax|Ay SubstituteB,coverallcombinations:A>Ax|Ay Applydirectleftrec. Convinceyourselfthatthisdoesnotchangethe language,onlythesequenceofproductions appliedinaderivation

Indirectleftrec.

Mostimportantly:

IntroducingLexandYacc

LexandYaccarelanguageswithmany implementationswe'llusethe'flex'and'bison'ones Theyaretiedtoeachother,aswellashavinga somewhathackishinterfacetoCbothcompileintC, andlargesectionsofaLexorYaccspecificationwillbe writteninC,directlyincludedintheresulting scanner/parser Specifications(*.land*.yfiles)arewrittenin3sections, separatedbyalinecontainingonly'%%' Initialization Rules Functionimplementations

Theinitializationsection

Thefirstsectionsetsthecontextfortherulesmakesure allfunctionsusedintherulesethavebeenprototyped,and declareanyvariables Anythingbetween'%{'and'%}'willbeincludedverbatim (#include,globalstatevars,prototypes) ThereisasmallhostofspecificcommandsforbothLexand Yacc,necessitieswillbecoveredhere Therestarecoveredinthisbook: Thebookisnotfantastic,butit canbeausefulreference

Lex:Rules

RulesinaLexspecificationaretransformedtoanautomatonina functioncalledyylex(),whichscansaninputstreamuntilit accepts,andreturnsatokenvaluetoindicatewhatitaccepted Aruleisaregularexpression,optionallytiedtoasmallblockofC codethetypicaltaskhereistoreturntheappropriatetoken valueforthematchedreg.exp. Yaccspecscangenerateaheaderfilefullofnamedtokenvalues thiswillbecalledy.tab.hbydefault,andcanbe#includedbya Lexspecsoyoudon'thavetomakeupyourowntokenvalues Characterclassesaremadewith[],e.g. [AZ]+(oneormorecapitalletters) [09]*(zeroormoredigits) [AZaz09](onealphanumericcharacter) Etc.etc.

Lex:Internalstate

Sometimesatokenvalueisnotenoughinformation: ...soyoumatchedanINTEGER.What'sit'svalue? ...soyoumatchedaSTRING.Whatdoesitsay? ...etc Thecharactersareshovedintoabuffer(char*)called 'yytext'astheyarematchedwhenarulecompletes, thisbufferwillcontainthematchingtext Shortlythereafter,itwillcontainthenextmatch instead.Copywhatyouneedwhileyoucan. Thereisalsoavariablecalled'yylval'whichcanbe usedforaspotofcommunicationwiththeparser.

Lex:Initialization

Typingupregularexpressionscangetmessy.Common partscanbegivennamesintheinitializationsection,such as DIGIT[09] WHITESPACE[\\t\n] Thesecanbereferredtointherulesas{DIGIT}and {WHITESPACE}tomakethingsalittlemorereadable Bydefaultthereisaprototypedfunction'yywrap'whichyou aresupposedtoimplementinordertohandletransitions betweenmultipleinputstreams(whenonerunsoutof characters). Wewon'tneedthat'%optionnoyywrap'willstopflexfrom naggingyouaboutdefiningit.

Yacc:Rules

Yaccrulesaregrammarproductionswithslightly differenttypography:A>B|Creads A: B {/*somecode*/}

Parserconstructsrightmostderivation, (shift/reduceparsing=tracingthesyntaxtree) Codeforaproductioniscalledwhentheproductionis matched Iftherighthandsideoftheproductionisjustatokenfrom thescanner,associatedvaluescanbetakenfromyylval

(Whitespaceisimmaterial,butImostlywritelikethis) | C ;

{/*othercode*/}

Yacc:Variables

Considertheproduction if_stmt:IFexprTHENstmtELSEstmtENDIF{/*code*/} Sincewewantthe/*code*/todosomethingwiththevalueswhich triggeredtheproduction,weneedamechanismtorefertothem Yaccprovidesitsownabstractvariables: $$isthelefthandsideoftheproduction(typicallythetargetof anassignment) $1referstoIF(mostlikelyatoken,here) $2referstoexpr(whichisprobablyeitheravalueorsome kindofdatastructure $3referstoTHEN(atokenagain) $4refersthefirststmt,(...andsoonandsoforth...) Whatarethetypesofallthese?

Thetypesofgrammarentities

Allterminals/nonterminalsarebydefaultmadeoftype YYSTYPE,whichcanbe#definedbytheprogrammer Ifmorethanonetypeisneededinagrammar,itcanbe definedasaunion %union{uint8_tui;char*str;}intheinit.sectionwillmake itpossibletoreferto'yylval.ui'and'yylval.str'whenpassing valuesfromthescanner Insidetheparser,typesaregiventosymbolswithanown directive:inthiscontext%type<ui>exprwillmakeexpr symbolsinthegrammarbetreatedas8bitunsignedints (whentheyarereferredtoas$x)

Tokens

Thetokenswhicharesenttotheheaderfile(includedbythe scanner)canbedefinedintheinit.Sectionthefollowing definestokensforstrings,numbers,andkeywordsif/else %tokenSTRINGNUMBERIFELSE Tokenscanbe%typedjustlikeothersymbols

yyerror

intyyerror(char*)iscalledwithanerrorstringparameter wheneverparsingfailsbecausethetextisgrammatically incorrect Yaccneedsanimplementationofthis Thereisanuninformativeoneintheprovidedcodeitcould easilybeimprovedwithmorehelpfulmessages,line# wheretheerroroccurred,etc.,butwe'llpassonthatforthe moment

Whattoputwhere?

It'spossible(buttricky)tomakeacompilerwithout separatinglexical,syntacticalandsemanticproperties Lexicalanalysiscanbedonewithgrammars,andboth scannersandparserscandoworkrelatedtosemantics Theresultveryeasilybecomesacomplicatedmess Recognizingtheseasdistinctthingsisasimplifiedmodelof languages,notalawofnature.Itdoesnotcaptureevery truthaboutalanguage,butithelpsdesignerstothinkabout onethingatatime Howtoapplythismodelisadecisionyoumake,butthe theoryismosthelpfulwhenyousticktoisolatingthethree typesofanalysisfromeachother

You might also like