Introduction

Last week, we defined an abstract problem as a collections of instances `I` and a set of solutions `S` for these instances.
If we specify an encoding for instances and solutions as binary string, the problem becomes concrete.
We said optimization problems are abstract problems where the solutions are always a biggest or smallest value.
We said decision problems are abstract problems where the solutions are always yes (true or 1) or no (false or 0).
We said an alphabet is any finite set, and a language is a set of strings over an alphabet.
We then considered classes of languages whose strings could all be decided by an algorithm that runs in time polynomial in the length of the string, `P`.
We also considered the class `NP`, where we said a language `L` is in `NP`, iff `L` can be written as
`L= {x in {0,1}^star : exists y, |y| = O(|x|^c) mbox( and ) A(x,y)=1}`
for some constant `c` and polynomial time algorithm `A`.
We begin today by looking at a method for comparing the relative computational hardness of languages...

Polynomial-Time Reducibility

There is some evidence to show that `P=NP` is unlikely.
Further many problems have been shown to be in NP.
So it is useful to be able to classify which NP problem are easy and which are hard.
To do this, we say a language `L_1` is polynomial-time reducible to language `L_2`, written `L_1 le_P L_2`, if there exists a polynomial time computable function `f:{0,1}^star -> {0,1}^star` such that for all `x in {0,1}^star`, `x in L_1` iff `f(x) in L_2`.

Lemma. If `L_1`, `L_2` are languages such that `L_1 le_P L_2` and `L_2` is in `P`, then `L_1` is in `P`.

Proof. Let `A(y)` decide `L_2` in time `O(p(|y|))`. Let `f(x)` be a `O(q(|x|))`-time reduction from `L_1` to `L_2`. Here `p` and `q` are polynomials. Then `B(x)` which first computes `f(x)` then runs `A(f(x))`, runs in `O(p(q(|x|))`-time and decides `L_1`. So `B` runs in polynomial time.

Quiz

Which of the following statements is true?

All Carmichael numbers are prime.
There are languages in NP that can be decided in linear time.
Miller Rabin can sometimes say composite even if a number is prime.

NP-completeness

The `P` languages in `NP` are the easy languages.
In contrast, a language `L` is called `NP`-complete if
1. `L in NP`, and
2. `L' le_P L` for every `L' in NP`.
A language which satisfies (2) but not necessarily (1) is called `NP`-hard.
Let `NPC` denote the class of `NP`-complete languages.

Theorem. If any `NP`-complete language is in `P`, then `P=NP`.

Proof. This follows from the lemma on the last slide.

A First NP-complete problem

Let CIRCUIT-SAT be the language:
`{langle C rangle | C` is an AND, OR, NOT circuit computing a 0-1 function which on some truth assignment to its input variables outputs 1`}`

Theorem. CIRCUIT-SAT is in NP.

Proof. Consider the following algorithm `A(langle C rangle, langle a rangle)`. First, `A` checks if `langle C rangle` is in the format of a circuit and `langle a rangle` is in the format for an assignment; if not, it rejects `A`. Otherwise, it then labels each of the inputs to `langle C rangle` with their value according to their values in `langle a rangle`. Then it loops over the combinational elements in `langle C rangle`, until there is no change doing the following:

Check if the current element is not assigned a value, but its children have been assigned a value.
Calculate the value of the node based on its gate type and its children.

By the `i`th iteration the nodes of depth `i` will have values. Each iteration involves less than quadratic work. So in `O((|langle C rangle|)^3)` this algorithm labels the root of the circuit with its output value on this assignment. Finally, CIRCUIT-SAT is the language `{langle C rangle in {0,1}^star : exists langle a rangle, |langle a rangle| le |langle C rangle| mbox( and ) A(langle C rangle, langle a rangle) = 1}`.

Cook's Theorem (1971)

Theorem. CIRCUIT-SAT is NP-hard.

Proof. Let `L` be a language in `NP`, let `A(x,y)` verify the language in time `O(|x|^c)`. The algorithm `A` runs on some kind of computational hardware. If that hardware is in a given configuration `c_i` then its control determines in the next time step what its next configuration `c_(i+1)` will be. We assume that this mapping can be computed by some AND, OR, NOT circuit `M` implementing the computer hardware. Using this circuit `M`, we build an AND, OR, NOT circuit `langle C(y) rangle` which is split into main layers which have the properties:

The output of `C` at main layer 1 codes, `c_0` , a configuration of `M` at the start of the computation of `A(x,y)`. Here the values of `x` are hard-coded based on the instance `x` which we are trying to check is in `L`. `y` is not hard-coded and boolean variables are used to represent it.
For each `i`, the output of `C` at main layer `i + 1`, corresponds to the configuration obtained from main layer `i` by computing according to `M`.
The output of `C` is the value extracted from the final configuration of `A` after `O(|x|^c)` steps.

Since there are polynomially many main layers each separated by polynomial-sized circuits, this whole circuit will be polynomial-size. If there is some setting of the boolean variables for `y` which makes the circuit true, then `A(x,y)` holds and `x` will be in `L` as desired.

NP-completeness Proofs

In general, most `NP`-completeness proof will make use of the following lemma:

Lemma. If some `NP`-complete language reduces to a language `L`, then `L` is `NP`-hard. If `L` is further in `NP` then `L` will be NP-complete.

Proof. Just compose the reductions.

Some NP-complete Problems

Let SAT = `{langle F rangle | langle F rangle` is a satisfiable boolean formula `}`
Let 3SAT = `{langle F rangle | langle F rangle` is a satisfiable CNF formula where each clause has at most three literal `}`.

Theorem. Both SAT and 3SAT are `NP`-complete.

Proof. First both languages are in `NP` by the same argument that showed CIRCUIT-SAT in `NP`. Given an instance `langle C rangle` of CIRCUIT-SAT, let gate `i` be coded as `langle i, type, j, k rangle`. Here type is AND, OR, NOT, or input, and `j, k < i` are gates which are inputs to this gate. A `0` for `j` or `k` means that argument is not used. Let `c_i`'s be new variables other than the input variables `x_j`. Recall the symbol `<=>` is true if both its boolean inputs have the same value. For each gate we create a boolean formula either of the form `c_i <=> (c_j mbox( type ) c_k)`, where type is replaced with AND or OR; or of the form `c_i <=> (mbox( type ) c_j)` in the case of NOT or an input (in the input case type is nothing). The SAT formula we output on input `langle C rangle` is the conjunction of all such defining formulas conjuncted with `c_w`, where `w` is the last gate in the formula. The idea is if `c_w` is true, then its defining equation `c_w <=>...` must be true and this propagates back to some setting of the leaves which will make the circuit true. By rewriting each `c_i <=> (c_j mbox( type ) c_k)` formulas in 3CNF we can make this whole formula into 3CNF. We can pad clauses with less than 3 literals with dummy variables to make all clauses the same size.

NP-completeness of CIRCUIT-SAT, SAT, k-SAT

Outline