CS152
Chris Pollett
Feb. 23, 2009
There are two approaches to parsing:
Both methods can be automated. Yacc/Bison is a shift/reduce parser.
Consider:
sentence → nounPhrase verbPhrase nounPhrase → article noun article → a | the
This might yield procedures such as:
void sentence (void) {nounPhrase();verbPhrase;} void nounPhrase(void) {article(); noun();} void article(void) {if (token=="a") match("a"); else if (token == "the") match("the"); else error(); }
void S() { A(); B(); if(parseError()) {rewind(); C(); D(); } }
Which of the following grammars is ambiguous:
void expr(void) {term(); if(token == "+"){match("+"); expr();}}
void expr() {term(); while(token == "+"){match("+"); term();}}
A right recursive rule like:
<expr> → <term> @ <expr> | <term>
Can also be rewritten in EBNF as:
<expr> → <term> [@ <expr> ]
This is called left factoring.
Consider:
<if-statement> → if(<expr>) <statement> | if(<expr>) <statement> else <statement>
This cannot be directly translated into code as both rules begin with the same prefix, but we can "factor out" the prefix:
<if-statement> → if(<expr>) <statement> [else <statement>]
This can be code viewing the [ ] as an if clause:
void ifStatement() { match("if"); match("("); expression(); match(")"); statement(); if(token=="else"){match("else"); statement();} }
<factor> ::= (<expr>)|<number> <number> ::= <digit> {<digit>} <digit> ::= 0|1|2|3|4|5|6|7|8|9We need First((<expr>)) and First(<number>) to be disjoint.
A → B [C] D. C→ aE | bF. D→ cG.