CS152
Chris Pollett
Feb. 23, 2009
There are two approaches to parsing:
Both methods can be automated. Yacc/Bison is a shift/reduce parser.
Consider:
sentence → nounPhrase verbPhrase nounPhrase → article noun article → a | the
This might yield procedures such as:
void sentence (void) {nounPhrase();verbPhrase;}
void nounPhrase(void) {article(); noun();}
void article(void) {if (token=="a") match("a");
else if (token == "the") match("the");
else error(); }
void S()
{
A(); B();
if(parseError()) {rewind(); C(); D(); }
}
Which of the following grammars is ambiguous:
void expr(void) {term(); if(token == "+"){match("+"); expr();}}
void expr() {term(); while(token == "+"){match("+"); term();}}
A right recursive rule like:
<expr> → <term> @ <expr> | <term>
Can also be rewritten in EBNF as:
<expr> → <term> [@ <expr> ]
This is called left factoring.
Consider:
<if-statement> → if(<expr>) <statement> | if(<expr>) <statement> else <statement>
This cannot be directly translated into code as both rules begin with the same prefix, but we can "factor out" the prefix:
<if-statement> → if(<expr>) <statement> [else <statement>]
This can be code viewing the [ ] as an if clause:
void ifStatement() {
match("if"); match("("); expression(); match(")"); statement();
if(token=="else"){match("else"); statement();}
}
<factor> ::= (<expr>)|<number>
<number> ::= <digit> {<digit>}
<digit> ::= 0|1|2|3|4|5|6|7|8|9
We need First((<expr>)) and First(<number>) to be disjoint.A → B [C] D. C→ aE | bF. D→ cG.