Regular Languages

Definitions

Sets of strings are called languages.

A set of strings L is regular is L = L(regex) for some regular expression regex.

Alternatively, L is regular if L = L(M) for some FSM M.

Let REG be the set of all regular languages.

Theorems

Theorem REG is closed under unions, intersections, and complements.

Proof:

Assume A and B are in REG.

Then A = L(a) and B = L(b) for regular expressions a and b, but A U B = L(a|b).

Assume C = Strings – A (i.e., all strings not in A). A = L(M) for some FSM M. We modify M to M' by setting M'.FINALS to the complement of M.Finals

The intersection of A and B is –(-A U –B), which is in REG by the above.

QED.

The "memory" of a FSM is its state, which has a number of possible values that is fixed in advance. This means a FSM can't remember an indefinite amount of stuff. This imposes severe limitations on what these machines can do.

Pumping Lemma: Assume L is a regular language. Assume z is in L and n <= |z| for some fixed constant n. Then z = uvw where |uv| <= n, 1 <= |v|, and for all i, uviw is in L.

Proof

Assume L = L(M) for some fsm M with n states. Assume M accepts z. Obviously some state must appear multiple times on M.accept(z). Let that state be k. So z = uvw where M.transitions(u, 0) = k, M.transitions(v, k, k), and M.transitions(w, k, j) where j is a final state. Note that we can repeat v multiple times since this always returns us to k.

QED.

Corollary L = {aibi | for 0 <= i} is not in REG

Proof: Assume L = L(M) for a fsm M and z = anbn for n = # of states in M. By pumping lemma anbn = uvw. But |uv| <= n, hence uv is a substring of an. But when we iterate v, the number of a's and b's will no longer be equal.

QED.

Theorem regular languages are recursive, but not vice-versa.

Proof:

We have already shown how to emulate FSMs in Java, so clearly L = L(M) implies L's membership function can be implemented by an algorithm.

It's also easy to show that L = {aibi | for 0 <= i} is recursive. (Left as an exercise.) But we know this isn't regular.

QED.