Ambiguity

A grammar G is ambiguous if for some w in L(G) there are two different parse trees for w.

For example, let G be the grammar:

EXP ::= EXP~(+|*)~EXP | NUMBER

While elegant, grammar G gives us too much freedom to construct a derivation.

For example, let w = 3 + 4 * 5. We begin our derivation with EXP, but then which operator should we derive first, + or *?

Let's derive + first. Here's the entire derivation:

EXP =>
EXP + EXP =>
EXP + EXP * EXP =>
EXP + EXP * NUMBER =>
EXP + EXP * 5 =>
EXP + NUMBER * 5 =>
EXP + 4 * 5 =>
NUMBER + 4 * 5 =>
2 + 4 * 5

Here's the corresponding parse tree:

Notice that the + node is closer to the root than the * node.

Now let's derive w again, but this time deriving * first. Here's the derivation:

EXP =>
EXP * EXP =>
EXP + EXP * EXP =>
NUMBER + EXP * EXP =>
3 + EXP * EXP =>
3 + NUMBER * EXP =>
3 + 4 * EXP =>
3 + 4 * NUMBER =>
3 + 4 * 5

And the corresponding parse tree:

In this case the * node is closer to the root than the + node.

So what? Does it make a difference? Imagine each of these trees being passed to an interpreter for execution. The interpreter would use recursion to evaluate the left and right children of the root, then use the operator represented by the middle child to combine the roots:

double eval(Tree t) {
   double arg1 = eval(t.left);
   double arg2 = eval(t.right);
   char op = t.middle;
   if (op == "+") return arg1 + arg2;
   if (op == "*") return arg1 * arg2;
   // etc.
}

In the first case the multiplication will be performed first. In other words:

3 + 4 * 5 = 3 + (4 * 5) = 3 + 20 = 23

In the second case the addition will be performed first:

3 + 4 * 5 = (3 + 4) * 5 = 7 * 5 = 35

Different answers! That's bad.

Disambiguation

To fix this problem we must try to find an unambiguous grammar G' such that L(G) = L(G'). We must try to eliminate choices.

Consider the following grammar G' that forces multiplications to be performed before additions:

EXP::= PRODUCT | PRODUCT + EXP
PRODUCT::= NUMBER | NUMBER * PRODUCT

Here's a G' derivation of w:

EXP =>
PRODUCT + EXP =>
PRODUCT + PRODUCT
3 + PRODUCT =>
3 + NUMBER * PRODUCT =>
3 + 4 * PRODUCT =>
3 + 4 * NUMBER =>
3 + 4 * 5

(Notice that we have choices within the derivation as to which variable to replace first: left-most, right-most, etc.) While it is beneficial to follow an algorithm, all of these choices will yield the same parse tree.)

Here's the parse tree:

Notice that there were no choices to be made between which operator should come first: + or *.

It can be proved that L(G) = L(G') and G' is unambiguous, however finding an equivalent unambiguous grammar for an arbitrary CFG is famously unsolvable:

Theorem There is no algorithm that implements the function:

// returns unambiguous G' with L(G) = L(G')
CFG disambiguate(CFG G) { ??? }

This means that disambiguating a CFG is an art rather than a technique.

Inherent Ambiguity

To make matters worse, there are CFLs for which there is no unambiguous grammar. Such languages are called inherently ambiguous.

Theorem: L = {anbncmdm | 1 <= n && 1 <= m} U {anbmcmdn | 1 <= n && 1 <= m} is inherently ambiguous.

Proof:

The idea is to show that strings of the form anbncndn must have two distinct left-most derivations.