Minimization, Closure Proofs, Regular Expressions




CS154

Chris Pollett

Feb. 16, 2011

Outline

State Minimization

Procedure for Equivalence

  1. Remove all inaccessible states. This can be done by checking for each state if there is a simple path from the start state to it.
  2. Consider all pairs `(p,q)` of states. If `p in F` but `q !in F` or vice versa, then mark the pair `(p,q)` distinguishable.
  3. Repeat until no previously unmarked pairs are marked:
    1. For all pairs `(p,q)` and all `a in Sigma`, compute `delta(p,a) = p_a` and `delta(q,a) = q_a`. If the pair `(p_a,q_a)` is marked as distinguishable, mark `(p,q)` as distinguishable. Idea: if in `p` `q` on the same letter you transition to distinguishable states then `p` and `q` must be distinguishable.

Procedure to Build Minimal Automaton

  1. Use procedure of last slide to generate state equivalence classes for original automata.
  2. For each equivalence class `[p] = {q | p~_I q}` create a new state.
  3. For each transition rule `delta(r,a)=s` of the original machine, add a transition `delta([r],a)=[s]`.
  4. The initial state of the new machine is `[q_0]` where `q_0` was the state of the machine we are trying to minimize.
  5. The final states of the new machine is the set `{[f] | f in F}`.

The first procedure for minimizing finite automata was given in Huffman 1954 (J. Franklin Institute. Vol 257. Iss. 3-4).

Our procedure above probably runs in quadratic time, the best known algorithm is `O(n log n)` due to Hopcroft 1971.

There is an important related result known as the Myhill Nerode theorem which is due to Nerode 1958 building on a paper of Myhill 1957. This result can be used to show the automata one gets from our construction is unique up to a renaming of states.

Example

Regular Expressions

Formal Definition of a Regular Expression

We say that `R` is a regular expression if `R` over some alphabet `Sigma` is:

  1. `a` for some symbol `a` in the alphabet `Sigma`.
  2. `epsilon`.
  3. `emptyset`.
  4. `(R_1 cup R_2)` where `R_1` and `R_2` are regular expressions. `R_1 + R_2` is used by JFLAP, most programming languages use `(R1 | R2)`.
  5. `(R_1R_2)` where `R_1` and `R_2` are regular expressions.
  6. `(R_1)^star` where `R_1` is a regular expression.

We write `R^+` as a shorthand for `R\ R^star`. Notice also we tend to be lazy on parentheses even thought to be fully well-formed everything has to be completely parenthesized.

We write `L(R)` for the language given by the regular expression.

Regular expressions were first considered in Kleene (1956).

Examples of the Definition

In a programming language like say PHP or Perl you might use things like: "\.|\,|\:|\;|\"|\'|\`|\[|\]|\{|\}|\(|\)|\!|\||\&" to match against, for instance, the punctuation symbols you want.

If you want to see regular expressions gone wild check out the Perl solution to the 99 bottles of beer song.

Equivalence with Finite Automata

Proof that regular expression implies regular

Proof cont'd

Assume now the result holds for languages for which the total number of uses of union, `*`, or concatenation is at most `n`. Consider `R` a regular language of complexity `n+1`. There are three cases to consider:

  1. `R` is of the form `(R_1 cup R_2)` where `R_1` and `R_2` are regular expressions of complexity `leq n`. By induction let `N_1` and `N_2` be the machines for `R_1` and `R_2`. Define `N` for `R` as:
    NFA for the union of two NFAs
  2. `(R_1R_2)` where `R_1` and `R_2` are regular expressions of complexity `leq n`. By induction let `N_1` and `N_2` be the machines for `R_1` and `R_2`. Define `N` for `R` as:
    NFA for the concatenation of two NFAs
  3. `(R_1)^star` where `R_1` is a regular expression of complexity `leq n`. By induction let `N_1` be the machine for `R_1`. Define `N` for `R` as:
    NFA for the Kleene star of an NFA