Sets of strings are called languages.
A set of strings L is regular is L = L(regex) for some regular expression regex.
Alternatively, L is regular if L = L(M) for some FSM M.
Let REG be the set of all regular languages.
Theorem REG is closed
under unions, intersections, and complements.
Proof:
Assume A and B are in REG.
Then A = L(a) and B = L(b) for regular expressions a and b, but A U B = L(a|b).
Assume C = Strings – A (i.e., all strings not in A). A = L(M) for some FSM M. We modify M to M' by setting M'.FINALS to the complement of M.Finals
The intersection of A and B is –(-A U –B), which is in REG by the above.
QED.
The "memory" of a FSM is its state, which has a number of possible values that is fixed in advance. This means a FSM can't remember an indefinite amount of stuff. This imposes severe limitations on what these machines can do.
Pumping Lemma: Assume
L is a regular language. Assume z is in L and n <= |z| for some fixed
constant n. Then z = uvw where |uv|
<= n, 1 <= |v|, and for all i, uviw is in L.
Proof
Assume L = L(M) for some fsm M with n states. Assume M accepts z. Obviously some state must appear multiple times on M.accept(z). Let that state be k. So z = uvw where M.transitions(u, 0) = k, M.transitions(v, k, k), and M.transitions(w, k, j) where j is a final state. Note that we can repeat v multiple times since this always returns us to k.
QED.
Corollary L = {aibi | for 0 <= i} is not in REG
Proof: Assume L = L(M) for a fsm M and z = anbn for n = # of states in M. By pumping lemma anbn = uvw. But |uv| <= n, hence uv is a substring of an. But when we iterate v, the number of a's and b's will no longer be equal.
QED.
Theorem regular
languages are recursive, but not vice-versa.
Proof:
We have already shown how to emulate FSMs in Java, so clearly L = L(M) implies L's membership function can be implemented by an algorithm.
It's also easy to show that L = {aibi | for 0 <= i} is recursive. (Left as an exercise.) But we know this isn't regular.
QED.