Minimization, Closure Proofs, Regular Expressions




CS154

Chris Pollett

Feb. 13, 2013

Outline

State Minimization

Procedure for Equivalence

  1. Remove all inaccessible states. This can be done by checking for each state if there is a simple path from the start state to it.
  2. Consider all pairs `(p,q)` of states. If `p in F` but `q !in F` or vice versa, then mark the pair `(p,q)` distinguishable.
  3. Repeat until no previously unmarked pairs are marked:
    1. For all pairs `(p,q)` and all `a in Sigma`, compute `delta(p,a) = p_a` and `delta(q,a) = q_a`. If the pair `(p_a,q_a)` is marked as distinguishable, mark `(p,q)` as distinguishable. Idea: if in `p` `q` on the same letter you transition to distinguishable states then `p` and `q` must be distinguishable.

Procedure to Build Minimal Automaton

  1. Use procedure of last slide to generate state equivalence classes for original automata.
  2. For each equivalence class `[p] = {q | p~_I q}` create a new state.
  3. For each transition rule `delta(r,a)=s` of the original machine, add a transition `delta([r],a)=[s]`.
  4. The initial state of the new machine is `[q_0]` where `q_0` was the state of the machine we are trying to minimize.
  5. The final states of the new machine is the set `{[f] | f in F}`.

The first procedure for minimizing finite automata was given in Huffman 1954 (J. Franklin Institute. Vol 257. Iss. 3-4).

Our procedure above probably runs in quadratic time, the best known algorithm is `O(n log n)` due to Hopcroft 1971.

Example

Indistinguishability and the Myhill-Nerode Theorem

Homework Problems (Sec1 and Sec2 - same problems)

Problem 3. Apply the Cartesian product construction to (i) and (j) of exercise 1.6 to obtain an automata recognizing the union of their languages.

Answer. The following is an automaton which recognizes strings every odd position is a 1 (solving i):

1.6i solution

And the following is an automaton which recognizes those string which have at least two 0's and at most one 1 (solving j):

1.6j solution

The machine coming from the Cartesian product construction where we consider only those states which are reachable from the start state is:

Solution to problem 3

Problem 4. Consider the variant of Exercise 1.38 where rather than being in the language occurs if every possible state that M could be in after reading input x is accepting, we instead only require more than half of the states be accepting. Prove that the resulting class also recognizes exactly the regular languages.

Answer. Let PReg (probabilistic regular) denote the class of languages recognized by machines of the above kind. There are two parts to this problem: We need to show Reg `subseteq` PReg and PReg `subseteq` Reg. The fact that we used the word "exactly" requires us to prove both directions. To see Reg `subseteq` PReg, let `L` be a regular language. By definition, `L=L(M)` for some DFA `M = (Q, Sigma, delta, q_0, F)`. Consider the machine `N = (Q \cup {q'}, Sigma, delta', q_0, F)` where `delta'` and `F'` are defined as follows. For any `q in Q` and `a in Sigma` define `delta'(q, a) = {delta(q,a)}` this is a well-defined mapping from `Q times Sigma -> P(Q \cup {q'})`. For we define `delta'(q, epsilon) = {q'}` for some new state `q'` not in `Q`. Define `delta'(q', x) = {q'}` for `x in Sigma cup {epsilon}`. Observe by induction that `delta'^\star(q, w) = E(delta^\star(q,w)) ={delta^\star(q,w), q'}` and so consists of at most one accepting state. Now notice `w in L` iff `delta^\star(q_0,w) = f` for some `f in F` iff `delta'^\star(q_0,w) = {f,q'}` for some `f in F`. If less than half the states in `delta'^\star(q_0, v) = {s,q'}` are accepting then none of them must be accepting. Hence `s !in F` so `v !in L`. Similarly, if at least half of the states in `delta'^\star(q_0, v)` then `s in F`, so `v in L`. Therefore `N` shows `L` is in PReg. On the other hand, suppose `L` is a language in Preg via some machine `N`. Apply the Power set construction to `N = (Q, Sigma, delta, q_0, F)` to get a machine `M = (P(Q), Sigma, delta', {q_0}, F')`. Rather than define `F'` as in the original construction define
`F' = {X | X \subset P(Q) mbox( at least half of X elements are accepting)}`. The resulting machine is a DFA recognizing the same language as `N`.

Regular Expressions

Formal Definition of a Regular Expression

We say that `R` is a regular expression if `R` over some alphabet `Sigma` is:

  1. `a` for some symbol `a` in the alphabet `Sigma`.
  2. `epsilon`.
  3. `emptyset`.
  4. `(R_1 cup R_2)` where `R_1` and `R_2` are regular expressions. `R_1 + R_2` is used by JFLAP, most programming languages use `(R1 | R2)`.
  5. `(R_1R_2)` where `R_1` and `R_2` are regular expressions.
  6. `(R_1)^star` where `R_1` is a regular expression.

We write `R^+` as a shorthand for `R\ R^star`. Notice also we tend to be lazy on parentheses even thought to be fully well-formed everything has to be completely parenthesized.

We write `L(R)` for the language given by the regular expression.

Regular expressions were first considered in Kleene (1956).

In older books, you sometimes see regular expressions called rational expressions.

Examples of the Definition

In a programming language like say PHP or Perl you might use things like: "\.|\,|\:|\;|\"|\'|\`|\]|\[|\{|\}|\(|\)|\!|\||\&" to match against, for instance, the punctuation symbols you want.

If you want to see regular expressions gone wild check out the Perl solution to the 99 bottles of beer song.

Some Regular Expression Identities

The following identities (`equiv` here meaning have the same language) are not too hard to verify:

Viewing emptyset as `0`, empty string as `1`, concatenation as multiplication, union as plus, the above show the regular expressions are a so-called semi-ring, and perhaps motivates why they are sometimes called rational expressions. It is not a ring because given `R` we can't easily define a regular expression `R'` such that `R cup R' equiv emptyset`.

Semi-rings don't typically have a star operation (there is a something called a star semi-ring). To reduce to the situation where one can get rid of star, one can look at languages which have the finite power property. That is, languages for which `L^\star = epsilon cup L cup ... cup L^(n-1)` for some `n ge 1`. Algorithms for checking this property have been given by Hashiguchi and Simon.

Equivalence with Finite Automata

Proof that regular expression implies regular

Proof cont'd

Assume now the result holds for languages for which the total number of uses of union, `*`, or concatenation is at most `n`. Consider `R` a regular language of complexity `n+1`. There are three cases to consider:

  1. `R` is of the form `(R_1 cup R_2)` where `R_1` and `R_2` are regular expressions of complexity `leq n`. By induction let `N_1` and `N_2` be the machines for `R_1` and `R_2`. Define `N` for `R` as:
    NFA for the union of two NFAs
  2. `(R_1R_2)` where `R_1` and `R_2` are regular expressions of complexity `leq n`. By induction let `N_1` and `N_2` be the machines for `R_1` and `R_2`. Define `N` for `R` as:
    NFA for the concatenation of two NFAs
  3. `(R_1)^star` where `R_1` is a regular expression of complexity `leq n`. By induction let `N_1` be the machine for `R_1`. Define `N` for `R` as:
    NFA for the Kleene star of an NFA