Compression, CFL Closure Properties, and CFL Algorithms




CS154

Chris Pollett

Mar 23, 2011

Outline

Grammar-Based Compression Algorithms

SEQUITUR

Example of Compressing with SEQUITUR

Closure Properties of CFL

CFL are closed under union
Proof idea: Let `G` and `H` with start symbols `S` and `T` respectively, be two CFGs for the CFL's we want to take the union of. Make a new grammar with the same alphabet, with the union of the two grammars productions (after renaming) together with the new rules `S'->S|T` where `S'` is the new start variable.
CFLs are closed under intersection by regular languages
Proof idea: use the cartesian product construction on a PDA for the CFL together with a DFA for the regular language.
CFLs are not closed under intersection or complementation
Proof: The languages `{a^nb^nc^m | n, m ge 0}` and `{a^mb^nc^n | n, m ge0}` are both context-free. Their intersection is `{a^nb^nc^n | n ge 0}` which is not by the pumping lemma. CFLs are not closed under complementation as using deMorgan rules, intersection can be defined from union and complementation.

Algorithms for CFLs

There is an algorithm, which given a grammar `G` written down formally as a 4-tuple, can decide whether or not `L(G)` is empty.
To do this, the algorithm check if the start variable is useless. If it is we know the language is empty; otherwise it is not.
There is an algorithm to check given G whether or not L(G) is infinite.
To do this, we first eliminate `epsilon`-rules, unit-productions, and useless symbols from `G`. We then construct a graph where `(A,B)` is an edge for two variables `A`, `B` in the graph iff `A->xBy` for some production in `G`. If there is a cycle in this graph then `C=>^star uCv` for some variable `C` in the original grammar. As there are no useless symbols in this grammar, we must have `C=>^star z`, for some string `z` of terminals. Hence, also, `S=>^star sCt =>^star szt, S=>^star sCt =>^star suCvt =>^star suzvt`, etc. Thus, one can argue there is a cycle in the graph iff `G'`s grammar is infinite.