Simulation of Nondeterministic Turing Machines

At the end of last Wednesday, we introudced Nondeterministic Turing Machines and had an In-Class Exercise about them. We now show how to simulate these machines using deterministic machines.

Theorem Any language decidable by an NTM is decidable by a 3-tape deterministic TM. Further, `\mbox{NTIME}(f(n))` is contained in `cup_{c > 1}\mbox{TIME}(c^{f(n)})`.

Proof: Let `N` be a NTM that recognizes some language. We will make a Turing Machine `D` to simulate `N`. The idea is to have `D` try all possible branches of `N`'s computation. If `D` ever finds an accept state of `N` on any of the branches then `D` accepts. So that we don't get stuck on infinite branches we will do our simulation in an iterated deepening, breadth first manner. `D` will have three tapes: the input tape, a simulation tape, and an address tape. `D` operates as follows:

Initially the input tape has the input and the other two tapes are blank.
`D` copies the input tape to the simulation tape.
`D` then simulate `N` according to the nondeterministic choices on the address tape. To do this:
- Let `c` be the finite maximum number of choices in any given state reading a given symbol for a next state.
- On the address tape we are going to write strings over the alphabet `0, 1, 2, ldots, c-1`, starting with the empty string and proceeding in lexicographical order
- If the address tape is blank, `D` checks to see if `N` immediately accepts. Otherwise, `D` writes a `0` on the address tape and simulates `N` one step using the first possible nondeterministic choice. If this doesn't accept, then `D` writes a `1`, erases the simulation tape, and simulate `N` according to the second nondeterministic choice. Once we have tried all single step computations, we then cycle over computations of length 2, etc.

Notice the runtime of this algorithm to simulate `f(n)` steps is `O(1 + c + c^2 + ...+ c^{f(n)}) = O(c^{f(n)+1}) = O(c^{f(n)})` so will be in `mbox{TIME}(c^{f(n)})` by linear speed-up.

Decidability and Encodings

Recall we said a TM decides a language if it halts on all inputs and it recognizes the language (i.e., it halts in an accept state iff `x` is in the language).
Define a language to be decidable or recursive if there is some TM which decides the language.
As we have been discussing DFAs, NFAs, regular expressions, context-free grammars, and PDAs we have also been giving algorithms for various questions related to these machines and grammars.
These algorithms correspond to decision procedures for languages. i.e., we could have presented the algorithms for TMs.
To make this more precise, we will imagine we have fixed an encoding format into a fixed alphabet for a Turing Machine for the set-theoretic notation we have been using to describe DFAs, NFAs, etc.
For example, we can imagine the tape alphabet of our TMs includes unicode or ASCII and so has a symbols for '{', '}', for comma, etc. Thus, one can write on a tape our set-theoretic descriptions of our machines and grammars.
We will write things like `\langle B \rangle` to mean the encoding of `B` where `B` might be a DFA or some other type of machine.
An expression of the form `\langle B, w \rangle` should be read as the encoding of the ordered pair `B`, `w`.

More Decidability

From our earlier lectures' algorithms the following languages are decidable:
1. `A_{\mbox{DFA}}={\langle B,w \rangle | mbox{B is a DFA that accepts input string w}}`. (Mar. 2)
2. `A_{\mbox{NFA}}={\langle N,w\rangle | mbox{N is an NFA that accepts input string w}}`. (TM converts N to a DFA then uses above)
3. `A_{\mbox{REX}}={\langle R,w\rangle | mbox{R is a regular expression that generates input string w}}` (TM converts R to a DFA then uses above)
4. `A_{\mbox{CFG}}={\langle G,w\rangle | mbox{G is a CFG that generates input string w}}` (Use CYK, Mar. 23)
5. `E_{\mbox{DFA}}={\langle D \rangle | mbox{D is a DFA such that L(D) is empty} }` (Mar. 2)
6. `E_{\mbox{CFG}}={\langle G \rangle | mbox{G is a CFG such that L(G) is empty} }` (Apr. 8)
7. `EQ_{\mbox{DFA}}={\langle D, D' \rangle | mbox{D, D' are a DFAs such that } L(D) = L(D') }` (TM uses Cartesian product construction to make a DFA for the symmetric difference of the two languages and checks if it is empty)
8. `\mbox{FIN}_{\mbox{DFA}}={\langle B \rangle | mbox{B is a DFA such that L(B) is finite}}`. (Mar. 2)
9. `\mbox{FIN}_{\mbox{CFG}}={\langle G \rangle | mbox{G is a CFG such that L(B) is finite}}`. (Apr. 8)

Universal Turing Machines -- Take 1

It is thus natural to ask if there is a decision procedure for:
`A_{TM}={ langle M,w rangle | mbox{M is a TM and M accepts w}}`
There is a recognition procedure for this language given by this high level description:
`U =`" On input `langle M,w rangle`, where `M` is a TM and `w` is a string:
1. Simulate `M` on input `w`.
2. If `M` ever enters its accept state, accept; if `M` ever enters its reject or other halt state, reject."
The above Turing Machine is called a Universal Turing Machine (a UTM) because it can be used to simulate any other Turing machine.
However, as `U` on a given input does not necessarily halt, it is not a decision procedure for `A_{TM}`.
It turns out it is impossible to get a decision procedure for `A_{TM}`.
The next few slides work towards showing this.
Before doing this, we first flesh out the above high-level recognition procedure description.

Universal Turing Machines -- Take 2

What would it mean to "Simulate `M` on input `w`"?
- You could imagine having a three tape machine, `U`. This machine `U` first copies the encoding of `M` over to the third tape and prepares the first tape so it just has `w` on it, where the first letter of `w` has is underscored. The underscore will be used to indicate where `M` is reading during the simulation.
- By examining `M`'s encoding, we can determine its start state and we can copy this to the second tape. We'll use the second tape to keep track of the current state.
- To simulate one step of `M`. We rewind the second and third tapes. We then scan the third tape to the start of the transition table. We scan the first tape to find what is the current character being read by `M`. We then on the third tape look at each transition in turn, comparing the state written on the second tape with the state of the transition, if we find a match we check if the symbol read matches that which we found from the first tape. If both match, we update the second tape and the first tape with that the transition says to do.

Quiz

Which of the following is true?

If L is in TIME(`n^2`), we can use our speed-up theorem from last week to show it is in TIME(`n`).
Our RAM machine model can be simulated by a TM with at most a cubic slowdown.
There are languages recognized by two tape Turing machines which cannot be recognized by one tape Turing machines.

Diagonalization

Suppose `f` is a one-to-one function from a countable set `A={a(0), a(1), a(2), ldots}` to sequences of elements over some set `B` of size at least `2`, such that the length of the sequence `f(a(i))` is at least `i+1`.
For example,
`f(a(0)) = (`1`, 0, 1)`
`f(a(1)) = (0, `0`, 0)`
`f(a(2)) = (0, 1, `1`)`
Let `f(a(i))_j` denote the `j`th element of the sequence `f(a(i))`.
The diagonal of this function `f` is the sequence `d(f)=(f(a(0))_0, f(a(1))_1, f(a(2))_2, ldots)`.
So in this case `d(f) = (1, 0, 1)`.
Call a sequence `d'(f)` a complement of the diagonal if `d'(f)_i` is always different from `d(f)_i`.
For example, for the f above a possible `d'(f)` is `(0, 1, 0)`.
The following theorem, essentially from Cantor, Uber eine elementare Frage der Mannigfaltigkeitslehre, 1891, is an easy consequence of our definition:
Theorem (Diagonalization Theorem). If `f` satisfies the first bullet above then it does not map any element to a complement of its diagonal.

Example Use of the Diagonalization Theorem

Corollary. A countable set `A` is not the same size as its `P(A)`.

Proof. Let `f:A -> P(A)` be a supposed bijection. Since `A` is countable, we have some function `a(k)` to list out its elements `a(0), a(1), a(2), ldots`. An element `X={a(2), a(5), ldots} in P(A)` can be view as an binary sequence `(0, 0, 1, 0, 0, 1, ldots)` where we have a `1` if `a(i)` is in `X` and a 0 otherwise. So the map `f` to the `P(A)` gives us a map to sequences that satisfies the Diagonalization theorem. A complement of the diagonal for `f` will still be in `P(A)` but not mapped to by `f`.

Let `NN` be the natural numbers. So `P(NN)` is uncountable.

Non-Recursively Enumerable Languages

Another corollary to the Diagonalization Theorem is the following (recall Turing Recognizable, recursively enumerable, and computably enumerable all mean the same thing):

Corollary. Some languages are not recursive enumerable.

Proof. The set of infinite sequences over `{0,1}` is uncountable, as we just indicated in the last proof there is a bijection between this set and `P(NN)`. On the other hand, each encoding `langle M rangle` of a Turing Machine is a finite string over a finite alphabet. The set of finite strings over an alphabet is countable: a map from the natural numbers first maps to strings of length `0` in lex order, then strings of length `1`, etc.

`A_{TM}` is not Recursive

Theorem. The language `A_{TM}={ langle M,w rangle | mbox{M is a TM and M accepts w}}` is not recursive.

Proof. Suppose `A` is a decider for `A_{TM}`. Fix `M_i` and consider `w's` of the form `langle M_j rangle` for some other TM, `M_j`. Then listing out encodings of TM's in lex order `langle M_0 rangle, langle M_1 rangle, ldots` we can create an infinite binary sequence where we have a `1` in the `j`th slot if `langle M_j rangle` causes `M_i` to halt and a `0` otherwise. If `A` is a decider for `A_{TM}` then we can consider a variant on the complement of the diagonal of the map
`f: langle M_i rangle |-> (A(langle M_i,langle M_0 rangle), A(langle M_i, langle M_1 rangle rangle), ldots)`.
In particular, we can let `D` be the machine:
`D=`"On input `langle M rangle`, where `M` is a TM:

Run `A` on input `langle M, langle M rangle rangle`.
If `A` halts accepting, then run forever. If `A` rejects, then accept."

Since `A` is supposed to be a decision procedure for `A_{TM}`, `A` halts on all inputs. Now consider `D(langle D rangle)`. Machine `D` halts if and only if `A` on input `langle D, langle D rangle rangle` rejects. But `A` on input `langle D, langle D rangle rangle` rejects means that `D` did not halt on input `langle D rangle`. This is contradictory. A similar argument can be made about if `D` does not halt `langle D rangle`. Since assuming the existence of `A` leads to a contradiction, hence `A` must not exist. Q.E.D.

Another way to look at this is if you give an `A` which purports to be a decider for `A_{TM}` then we can give a specific input, `langle D, langle D rangle rangle`, which is calculated based on `A` on which `A` fails to decide `A_{TM}`.

UTMs, Diagonalization, Hierarchy Theorems, co-r.e. sets, Reductions

Outline