Intermediate Languages

If a problem is not in `P`, but is in `NP` it is quite often the case that one can show it is `NP`-complete.
If `P=NP` then this will always be the case.
However, suppose `P ne NP`, are there languages not in `P` that are not `NP`-complete?
Ladner (1975) showed that there are.

Ladner's Theorem

Theorem. Suppose `P ne NP`. Then there exists a language `L in NP - P` that is not `NP`-complete.

Proof. For every function `H: NN -> NN`, we define the language `SAT_H` to contain all length-`n` satisfiable formulas that are padded by `n^(H(n))` `1`'s. That is,
`SAT_H = { \psi 01^(n^(H(n))) : psi in SAT mbox( and ) n= |psi| }`.
We now define a particular `H: NN -> NN ` as follows:

`H(n)` is the smallest number `i < log log n` such that for every `x in {0, 1}^star` with `|x| le log n`, `M_i` outputs `SAT_H(x)` within `i|x|^i` steps. If there is no such `i` then `H(n) = log log n`.

`H` is well-defined since `H(n)` determines membership in `SAT_H` of strings whose length is greater than `n`, and the definition of `H(n)` relies upon checking the status of strings of length at most `log n`. In fact, the definition of `H` implies an `O(n^3)`-time algorithm to compute `H(n)` from `n`. (In-Class Exercise in a moment). `H` was defined to ensure the following claim is true:

More Ladner's Theorem

Claim. `SAT_H in P` iff `H(n) = O(1)`. Moreover, if `SAT_H !in P` then `H(n)` tends to infinity with `n`.

Proof. (`SAT_H in P => H(n) = O(1)`): Suppose there is a machine `M` solving `SAT_H` in at most `cn^c` steps. Since `M` is represented by infinitely many string, there is a number `i > c` such that `M = M_i`. The definition of `H(n)` implies that for `n > 2^(2^i)`, `H(n) leq i`. Thus `H(n) = O(1)`.

(`H(n) = O(1) => SAT_H in P`): If `H(n) = O(1)` then `H` can take only one of finitely many values, and hence there exists an `i` such that `H(n) = i` for infinitely many `n`'s. But this implies that TM `M_i` solves `SAT_H` in time `i n^i`: otherwise, if there was an input on which `M_i` fails to output the right answer within this bound, then for every `n > 2^(|x|)`, we would have `H(n) ne i`. Note this holds even if we are only assuming that there's some constant `C` such that `H(n) < C` for infinitely many `n`'s, hence proving the moreover part of the claim.

Finish Ladner's Theorem

Using the claim, we can show that if `P ne NP` then `SAT_H` is neither in `P` nor `NP`-complete:

Suppose that `SAT_H in P`. Then by the claim, `H(n) le C` for some constant `C`, implying that `SAT_H` is simple `SAT` padded by at most a polynomial (namely, `n^c`) many `1`'s. But then then a `p`-time algorithm for `SAT_H` can be used to solve `SAT` in `p`-time, implying `P=NP`!
Suppose that `SAT_H` is `NP`-complete. This means there is a reduction `f` from `SAT` to `SAT_H` that runs in `O(n^i)` time for some constant `i`. Since we already know `SAT_H !in P`, the claim above implies that `H(n)` tends to infinity. Since the reduction works in `O(n^i)` time only, for large enough `n` it must map `SAT` instance of size `n` to `SAT_H` instances of size smaller than `n^(H(N))`. Thus for large enough formula `phi`, the reduction `f` must map it to a string of the type `psi01^(n^(H(|psi|)))` where `psi` is smaller by more than some fixed polynomial factor, say, smaller that `n^(1/3)`. But the existence of such a reduction yields a simple polynomial time algorithm for `SAT`, contradicting `P ne NP`! (Just iteratively apply this reduction until the SAT instance one needs to solve can be done by brute force.)

In-class Exercise

At the start of Ladner's Theorem we asserted that `H(n)` could be computed in `O(n^3)`-time

Write up why this is true and post it to the Mar 15 Class Thread.

Oracles

We now consider TMs which have access to a black box called an oracle.
It turns out many of the proofs about relationships between complexity classes carry over to the oracle setting.
So oracle results give us bounds on what can happen for the usual complexity classes without oracles.
The oracle setting also tells us something about the strength of reductions.
Namely, one might ask: Can an "`NP`-reduction" be more powerful that a "`P`-reduction"?

Oracle Machines

Definition. A Turing Machine `M^?` with oracle is a multi-tape DTM (a similar definition works for NDTMs) with a special read-write query tape. It also has three distinguished states `q_?` (book calls `q_(query)` but harder to type), `q_(yes)`, `q_(no)`. We feed into the "?" slot of `M^?` an oracle language `A subseteq Sigma^star` to get a machine `M^A`. On input `x`, `M^A` computes as normal unless it enters the state `q_?`, in which case if `y` is the contents of the query tape then the next state will be `q_(yes)` if `y` is in `A` and will be `q_(no)` if `y` is not in `A`. The computation keeps going until a halt state is reached.

Here are a couple points to keep in mind about oracle machines:

`M^A` might enter the query state `q_?` several times during its computation, so might ask for several different strings if they belong to `A`.
Given space or time bounded complexity class `C` defined using DTMs or NDTMs, let `C^A` denote the class of languages one gets by allowing the machines in `C` to be oracle machines with access to `A`. That is, `P^A` is the class of languages recognized in p-time by DTMs `M^A`.

Examples

Let `bar {SAT}` denote the language consisting of unsatisfiable formulas. Then `bar {SAT} in P^(SAT)`: If `phi` is an instance of `bar {SAT}` then with a p-time machine having an oracle for `SAT`, one can write `phi` on the query tape, enter the query state, and find out if the formula is satisfiable. If not, output `1`; if yes, output `0`.
Let `A in P`, then `P^A = P`. Given a p-time `M^A`, a non-oracle machine could simulate `M^A` until it enters the query state. Since `A` is in `P`, the non-oracle machine could then simulate the p-time machine for `A` until it gets an answers, then uses this answer to keep simulating `M^A`. The total run-time of this machine will be bounded by the product of the degrees of the run-times of `M^A` and `A`.

Baker-Gill-Solovay

Theorem. There are oracle sets `A`, `B` such that `P^A=NP^A` and `P^B ne NP^B`.

Proof. Consider the following canonical `EXP`-complete language:
`A = {langle M, x, 1^n rangle | M mbox( outputs 1 on x within ) 2^n mbox( steps) }`

Notice `A in EXP`. Recall we showed `P subseteq NP subseteq EXP`. `P^A = EXP` since given a `L(M)` in `EXP` and an input `x` in p-time we could write `langle M, x, 1^n rangle` on the query tape and then using the oracle determine if `x in L(M)`. On the other hand, by the same argument as on the previous slide, since `A in EXP`, we will have `NP^A subseteq EXP^A = EXP`. So we have `P^A = NP^A =EXP`.

The construction for `B` is a little more involved. Let `L` be the following language:
`L_B = { 0^n | mbox( There is an ) x in B mbox( with ) |x|=n}`.
This language is in `NP^B`. We guess an `x` of length `n` and check if it is in `B` using the oracle. We will show that we can choose `B` so that this language is not in `P^B`.

BGS proof cont'd

To build `B` we enumerate oracle DTMs, `M_1^?`, `M_2^?`, `...` by listing out strings in lex order and then checking if they are oracle DTMs.
We define `B` in stages `B_i` so that `B = cup_i B_i` based on which oracle DTM we have just enumerated.
Our construction has the property that `B_i` contains all strings in `B` of length `le i`.
`B_0` is the empty set.
Assume we have constructed `B_(i-1)` and have just written `M_i^?` on the tape where we are doing the enumeration. We then simulate `M_i^B(0^i)` for `i^(log i)` steps.
Notice this is more than polynomially many steps.
Since we haven't completed `B` yet how do we answer oracle queries? ...

Yet More proof

To answer queries "y in B?":
- If `|y| < i` then answer according to `B_(i-1)`.
- If `|y| ge i` then answer "no" and make sure to remember `y` in some "no" set stored on another string, so that we never add `y` to `B`.
Suppose after `i^(log i)` steps `M_i^B` rejects. Then we pick some string of length `i` that was never queried by any `M_j^B` for `j le i`.
This is possible since
`sum_(j=1)^i j^(log j) le sum_(j=1)^i i^(log i) = i cdot 2^(log^2 i) < 2^i`.
On the other hand, if `M_i^B` accepts, we set `B_i= B_(i-1)`, so that there are no strings of length `i in B` and so `L` does not contain `0^i`.
The last case is that `M_i^B` did not halt within `i^(log i)` steps. This might happen even if `M_i^B` is p-time if the coefficients in the polynomial `p(i)` bounding its run time are such that `i^(log i) le p(i)`. Again, we set `B_i= B_(i-1)`. We know that an equivalent machine to `M_i^B` will eventually be listed out with large enough index `I` so that `I^(log I) ge p(I)` in which case the first two cases will ensure that `M_i^B`'s is not `L`.

Ladner's Theorem, Baker, Gill, Solovay's Result

Outline