Star-Height, Homomorphism, Quotients, Pumping Lemma




CS154

Chris Pollett

Mar 2, 2020

Outline

Introduction

Classifying Regular Languages

Facts about Star-height

Quiz

Which of the following is true?

  1. The construction from class of an NFA from a regular expression of length `m` produces an NFA with `O(m)` states.
  2. `((ab)^{\star}\cap(a\cup b)^{\star))` is a regular expression.
  3. Our proof that the language of any DFA is the language of a regular expression was a proof by contradiction.

Introduction to Homomorphisms

Definition of Homomorphism

Closure under Homomorphism

Theorem. Let `L` be a regular language over `Sigma` and let `h:Sigma ->(Sigma')^star` be a homomorphism. Then `h(L)` is a regular language over `Sigma'`.

Proof. We have shown that every regular language can be represented by a regular expression. Let `R` be the regular expression for `L`. We prove by induction on the complexity of `R` that `h(L)` will be regular. In the base `R` is either a symbol a of `Sigma` or it is the empty string, or it is the empty set. In the latter two cases `L(R) = L(h(R))`, so we are done. In the first case, we note that `h(a)` is a string over `Sigma'` and so will be a regular expression over the `Sigma'` alphabet. for the induction step, `R` is either of the form `R = (R_1R_2)`, `R = (R_1 cup R_2)`, or `R = (R_1)^star`. In each of these cases, we have by the induction hypothesis a regular expressions `R_1'` and `R_2'` for the homomorphic images of the languages of the subexpressions. So to make regular expressions for the homomorphic image of the language for `R` we can take either: `R' = (R_1'R_2')`, `R' = (R_1' cup R_2')`, or `R' = (R_1')^star`.

A generalization of closure under homomorphism, closure under substitutions, was first proven in Bar-Hillel, Perles, and Shamir 1961.

Quotients

Closure under Quotients

Theorem. If `A` and `B` are regular languages, then `A/B` is also a regular language.

Proof. Let `M=(Q, Sigma, delta, q_0, F)` be a DFA for `A` and let `M'` be a DFA for `B`. So a string `v` is in `A/B = (L(M))/(L(M'))` if `delta^star(q_0,v)=q_i` for some `i`, `delta^star(q_i,w) in F`, and `w in L(M')`. Let `M_i = (Q, Sigma, \delta, q_i, F)` , then `L(M_i) cap L(M')` is regular by a cartesian product construction. Notice `L(M_i) cap L(M')` is nonempty iff the two conditions `delta^star(q_i, w) in F`, and `w in L(M')` hold for some `w`. Further, we can check if `L(M_i) cap L(M')` is nonempty by seeing if some accepting state of this language is reachable from the start state. Hence, we can make a machine for `A/B` as `(Q, Sigma, delta, q_0, F')` where `F'` are those state in `Q` such that `L(M_i) cap L(M')` is nonempty.

This result is due to Ginsberg and Spanier 1963.

Membership, Emptiness and Finiteness Checking

Theorem. Given a regular language `L` in standard representation and a string `w`, there are algorithms which can check: (a) if `w` is in `L`, (b) if `L` is empty, and (c) if `L` is finite.

Proof. The first step of each algorithm is to obtain a DFA for L. For (a) we can just use the Java program (modified with the correct transition table) we wrote a few lectures back to simulate a DFA on a input string. For (b), we view the transition function of the finite automata for `L` as specifying a labeled graph and we use the algorithm we gave earlier for reachability in a graph to check if an accept state is reachable from the start state. If it is, following the edge labels would give an element in `L` and so would mean `L` was non-empty. If no accept state was reachable, the language would be empty. For (c), let `M = (Q, Sigma, delta, s, F)` be a DFA for `L` . An algorithm to check if `L` is finite could cycle through each state `q` of `Q` and check if `q` is reachable from `s`. If it is it then check is `q` reachable from `q`. If it is, it finally checks whether any accepting state is reachable from`q`. If any `q` meets all three of these conditions then by cycling over the `q` to `q` path differing numbers of times we can show arbitrarily many strings are in the language. This is because `M` is a DFA so the loop must involve traversing some alphabet letters. So the language will not be finite. If no such `q` exists then we know that no accept state is reachable from any state associated with a cycle so the language must be finite.

The above Theorem is due to Moore (1956).

Remark. Suppose our regular language was presented as a regular expression `R`. The way we have been suggesting to perform check if `w` is in `L` is to convert `R` to an NFA `N`, do the powerset construction to get a DFA `D`, minimize the DFA, run the algorithm on the DFA. If `N` has `m` states then the size of D might be as large as `O(2^m)` which might mean a lot of space is used even after minimization. To see this consider the language {`w \in {0,1}^star`| the `m`th last character of `w` is a `0`}. This has an `m` state NFA but requires `Omega(2^m)` states as a DFA. For this reason, people often try to simulate `w` directly on the NFA, keeping a list of states which one might be in at an given step.

That is, suppose CurrentStates is a list of states we might be in after processing the first `t` characters from a string `w`. Let `w_{t+1}` be the `t+1`st character of `w`. Initial NextStates to be the empty set of states. Then for each state `q` in CurrentStates, we use our NFAs transition `delta`, and add the states `E(delta(q, w_{t+1}))` to NextStates. Once we have done this for all states in CurrentStates. We set CurrentStates = NextStates and proceed to the next character. If after processing the whole string our last set of CurrentStates has an accept state in it we accept, otherwise, we reject. One can verify the runtime of this algorithm is `O(m cdot |w|)` and it uses at most `O(m log m)` space.

Remark. For some particular regular expressions, rather than use an NFA, we can just use a DFA. For example, if we have a regex only involving concatenation and alphabet symbols of length `n` (corresponding to searching for a pattern like "chris" within a string), then we can make a DFA with n+1 states with appropriate back arrows. If we have a pattern of star height 0, we can use the distributive property of regex's to rewrite this as a union of fixed strings. For example, `(a cup b)(a cup b)` becomes `(aa\cup ab \cup ba\cup \b\b)`. If we simulate an NFA on each of the union'd strings, we only have to keep track of states proportional to the number of unions rather than the number of symbols in the regex.

The question of minimizing regular expressions for finite languages is an active area of research as evidenced by Gruber , Holzer, Wolfsteiner 2020 results on the time complexity of approximating the smallest regular expression for a finite language.

The Pumping Lemma

More on the Pumping Lemma

Using the Pumping Lemma

We can use the pumping lemma to show language are not regular. For example, let `C={ w| w \ mbox(has an equal number of 0's and 1's)}`. To prove `C` is not regular:

More Examples