Professional Documents
Culture Documents
Directleftrec.
Indirectleftrec.
Mostimportantly:
IntroducingLexandYacc
LexandYaccarelanguageswithmany implementationswe'llusethe'flex'and'bison'ones Theyaretiedtoeachother,aswellashavinga somewhathackishinterfacetoCbothcompileintC, andlargesectionsofaLexorYaccspecificationwillbe writteninC,directlyincludedintheresulting scanner/parser Specifications(*.land*.yfiles)arewrittenin3sections, separatedbyalinecontainingonly'%%' Initialization Rules Functionimplementations
Theinitializationsection
Thefirstsectionsetsthecontextfortherulesmakesure allfunctionsusedintherulesethavebeenprototyped,and declareanyvariables Anythingbetween'%{'and'%}'willbeincludedverbatim (#include,globalstatevars,prototypes) ThereisasmallhostofspecificcommandsforbothLexand Yacc,necessitieswillbecoveredhere Therestarecoveredinthisbook: Thebookisnotfantastic,butit canbeausefulreference
Lex:Rules
RulesinaLexspecificationaretransformedtoanautomatonina functioncalledyylex(),whichscansaninputstreamuntilit accepts,andreturnsatokenvaluetoindicatewhatitaccepted Aruleisaregularexpression,optionallytiedtoasmallblockofC codethetypicaltaskhereistoreturntheappropriatetoken valueforthematchedreg.exp. Yaccspecscangenerateaheaderfilefullofnamedtokenvalues thiswillbecalledy.tab.hbydefault,andcanbe#includedbya Lexspecsoyoudon'thavetomakeupyourowntokenvalues Characterclassesaremadewith[],e.g. [AZ]+(oneormorecapitalletters) [09]*(zeroormoredigits) [AZaz09](onealphanumericcharacter) Etc.etc.
Lex:Internalstate
Sometimesatokenvalueisnotenoughinformation: ...soyoumatchedanINTEGER.What'sit'svalue? ...soyoumatchedaSTRING.Whatdoesitsay? ...etc Thecharactersareshovedintoabuffer(char*)called 'yytext'astheyarematchedwhenarulecompletes, thisbufferwillcontainthematchingtext Shortlythereafter,itwillcontainthenextmatch instead.Copywhatyouneedwhileyoucan. Thereisalsoavariablecalled'yylval'whichcanbe usedforaspotofcommunicationwiththeparser.
Lex:Initialization
Typingupregularexpressionscangetmessy.Common partscanbegivennamesintheinitializationsection,such as DIGIT[09] WHITESPACE[\\t\n] Thesecanbereferredtointherulesas{DIGIT}and {WHITESPACE}tomakethingsalittlemorereadable Bydefaultthereisaprototypedfunction'yywrap'whichyou aresupposedtoimplementinordertohandletransitions betweenmultipleinputstreams(whenonerunsoutof characters). Wewon'tneedthat'%optionnoyywrap'willstopflexfrom naggingyouaboutdefiningit.
Yacc:Rules
(Whitespaceisimmaterial,butImostlywritelikethis) | C ;
{/*othercode*/}
Yacc:Variables
Considertheproduction if_stmt:IFexprTHENstmtELSEstmtENDIF{/*code*/} Sincewewantthe/*code*/todosomethingwiththevalueswhich triggeredtheproduction,weneedamechanismtorefertothem Yaccprovidesitsownabstractvariables: $$isthelefthandsideoftheproduction(typicallythetargetof anassignment) $1referstoIF(mostlikelyatoken,here) $2referstoexpr(whichisprobablyeitheravalueorsome kindofdatastructure $3referstoTHEN(atokenagain) $4refersthefirststmt,(...andsoonandsoforth...) Whatarethetypesofallthese?
Thetypesofgrammarentities
Allterminals/nonterminalsarebydefaultmadeoftype YYSTYPE,whichcanbe#definedbytheprogrammer Ifmorethanonetypeisneededinagrammar,itcanbe definedasaunion %union{uint8_tui;char*str;}intheinit.sectionwillmake itpossibletoreferto'yylval.ui'and'yylval.str'whenpassing valuesfromthescanner Insidetheparser,typesaregiventosymbolswithanown directive:inthiscontext%type<ui>exprwillmakeexpr symbolsinthegrammarbetreatedas8bitunsignedints (whentheyarereferredtoas$x)
Tokens
yyerror
Whattoputwhere?
It'spossible(buttricky)tomakeacompilerwithout separatinglexical,syntacticalandsemanticproperties Lexicalanalysiscanbedonewithgrammars,andboth scannersandparserscandoworkrelatedtosemantics Theresultveryeasilybecomesacomplicatedmess Recognizingtheseasdistinctthingsisasimplifiedmodelof languages,notalawofnature.Itdoesnotcaptureevery truthaboutalanguage,butithelpsdesignerstothinkabout onethingatatime Howtoapplythismodelisadecisionyoumake,butthe theoryismosthelpfulwhenyousticktoisolatingthethree typesofanalysisfromeachother