Chomsky Normal Form




CS154

Chris Pollett

Mar 11, 2013

Outline

Chomsky Normal Form

Conversion to Chomsky Normal Form (Chomsky 1959)

Theorem. Any CFL `L` can be generated by a CFG in Chomsky Normal Form

Proof. Let `G` be a CFG for `L`. First we add a new start variable and rule `S_0 ->S`. This guarantees the start variable does not occur on the RHS of any rule. We remove any `epsilon`-rules `A -> epsilon` where `A` is not the start variable. To do this for each occurrence of `A` on the RHS of a rule, say `R -> uAv`, we add a rule `R -> uv`. We do this for each occurrence of an `A`. So for `R -> uAvAw`, we would add the rules `R ->uvAw`, `R -> uAvw`, `R -> uvw`. If we had the rule `R ->A`, add the rule `R -> epsilon` unless we previously removed the rule `R -> epsilon`. This rule will be removed when we perform our steps for the variable `R`. We cycle over variables repeating these steps till all epsilon rules have been eliminated. Next we handle unit rules `A -> B`. To do this, we delete this rule and then for each rule of the form `B -> u`, we add then rule `A ->u`, unless this is a unit rule that was previously removed. We repeat until we eliminate unit rules. Finally, we convert all the remaining rules to the proper form. For any rule `A -> u_1u_2 ldots u_k` where `k geq 3` and where each `u_i` is a variable or a terminal symbol, we replace the rule with `A -> u_1A_1`, `A_1 -> u_2 A_2`, `ldots` `A_(k-2) -> u_(k-1)u_k`. For any rule with `k=2`, we replace any terminal with a new variable `U_i` and a rule `U_i -> u_i`.

Example

Quiz (Sec 1)

Which of the following is true?

  1. Given a context-free grammar there is an algorithm which eliminates rules of the form `A -> B` yielding a grammar with the same language.
  2. Given a grammar `G` for some language `L - \{epsilon\}`, the smallest context free grammar for `L` requires quadratically more rules than `G`.
  3. It is impossible for a string `w` to have more than one parse tree with respect to a context free grammar `G`.

Quiz (Sec 3)

Which of the following is true?

  1. When coming up with a context free grammar for a programming language, we want the grammar to be inherently ambiguous.
  2. We can check if a string is generated by an s-grammar in linear time.
  3. A context-free grammar cannot have a nullable rule.

Introduction to Cocke-Younger-Kasami (CYK) algorithm (1960)

More Remarks on CYK

The CYK algorithm

Example

Greibach Normal Form

Example Converting to Greibach Normal Form