Outline
- Introduction to Complexity
- Classes P and NP
- In-Class Exercise
- Reductions
Introduction to NP-Completeness
- Most of the algorithms we have studied in this class run in polynomial time or some randomized variant of polynomial time.
- That is, on all inputs of length `n`, the algorithms we've considered run in time at most `O(n^k)` for some fixed `k`.
- We'll start looking today at some problems for which it is unknown if such efficient algorithms exist.
- To do this, first, we make formal what it is we mean by polynomial time, then we consider problem which might not be doable in polynomial time.
Abstract Problems
- Let's fix a framework for describing problems and reasoning about their runtimes.
- Define an abstract problem `Q` to consist of a set of instances `I` and a set of solutions `S`.
- For example, for SHORTEST-PATH the instances might be triples consisting of graph and two vertices.
A solution might be a sequence of vertices for a path between those two points in the graph of shortest distance.
- We will be interested in a subclass of problems called decision problems, where the answers are always yes (1) or no (0).
- For example, does there exists a shortest path of size at most `k`?
- It is usually straightforward to binary search from a way to solve the
decision problem to solve the associated optimization problem.
- For example, we might first ask is there a path of length at most 1? If there is, we are done; if not, we double our choice of `k` and ask again. A some point we will find a `k` such that there is a path of length at most `2k`, but not of length `k`. We can then continue our binary search between `2k` and `k`.
- Here optimization problems are where we want to find a largest or smallest value.
Encodings
- An encoding of a set `S` of abstract objects is a mapping from `S` to binary strings.
- For example, one can encode the natural numbers `{0, 1, 2, ...}` as strings `{0, 1, 10,..}`.
- One can encode legal English sentences using ASCII, etc.
- A computer algorithm "solves" some abstract decision problem
by going from an encoding of a problem instance as an input to `0` or `1` as output.
- We call a problem whose instance set is the set of binary strings a concrete problem.
- We say an algorithm solves the problem in `O(T(n))` time if when provided a problem instance `i in I` of length `n = |i|`,
the algorithm can produce the solution using `O(T(n))` steps.
- A concrete problem is called polynomial time decidable
if there is an algorithm that solves it which on all instances of length `n`
runs in time `O(n^k)` for some fixed `k`.
- We write `P` for the class of all such decision problems.
- Similarly, we can define the class of polynomial computed functions `f:{0,1}^\star -> {0,1}^\star`.
Formal Languages
- In order to study decision problems it is useful to have an understanding of formal languages.
- An alphabet `Sigma` is a finite set of symbols.
- A language is a set of strings over the symbols in
an alphabet.
- Some common ways to create new languages from old ones is via unions, concatenation, and star.
More on Languages
- We want to connect algorithms with languages.
- We say an algorithm `A` accepts a string `x` if `A` run on `x` outputs `1`.
- If it outputs `0`, it rejects the string.
- We say an algorithm `A` accepts a language `L` if the only strings it accepts are in `L`.
- We say a language is decided by `A` if `A` accepts the language and strings not in the language are rejected.
- A complexity class is a set of languages membership in which is determined by some complexity measure, for instance, runtime.
- For example, `P` is the complexity class of languages decided in polynomial time.
- It is also equivalently formulated as the class of languages accepted in polynomial time.
(Just run polynomially many steps if it hasn't accepted yet, reject.)
Polynomial-Time Verification
- We now look at algorithms which can verify membership in languages.
- As an example...
- Call an undirected graph G Hamiltonian if it contains a Hamiltonian
cycle that is, a simple cycle which contain each vertex of G.
- Let HAM-CYCLE `= { langle G rangle | G mbox( is a Hamiltonian graph )}`
- How might one decide this problem? One could try each possible permutation of vertices.
Let `m` be the number of vertices of the graph. Typically, `m = Omega(sqrt(|langle G rangle|))`.
There are `m!` many permutations. So this algorithm would have exponential runtime.
- On the other hand, consider the language
`H = {langle G, P rangle | P mbox( is a Hamiltonian cycle in ) G}`.
This language has a polynomial time decision algorithm. Further,
the size of `P` is polynomial in the size of `G`, so we could rewrite HAM- CYCLE as:
`{ langle G rangle | exists P, |P| le |G| and langle G, P rangle in H}`
- `H` can be viewed as verifying HAM-CYCLE in polynomial time.
The complexity class NP
- We are now ready to define the complexity class `NP`.
- We say a language `L` belongs to `NP` if there exists a two input
polynomial-time algorithm `A` and a constant `c` such that
`L= {x in {0,1}^star : exists y, |y| = O(|x|^c) mbox( and ) A(x,y)=1}`
- I.e., it is the class of languages that have polynomial time verification algorithms.
So HAM-CYCLE `in NP`.
- It is not hard to see `P subseteq NP`, but it is unknown if `P=NP`.
- In fact, there is a million dollar prize to anyone who can solve this
problem.
- Given a complexity class `C`, let `co-C` denote the class of languages
whose complement is in `C`.
- One can see `P subseteq NP cap co-NP`, but it is unknown if equality holds.
In-Class Exercise
Let `EXP` be the class of languages decided by deterministic algorithms on inputs of length `n` in time `O(2^{n^k})` for some fixed constant `k > 0`.
Prove `NP subseteq EXP`. Argue there is a language not in `EXP`, but which can be solved in time `O(2^{2^n})`.
Post your solution to the Apr 17 In-Class Exercise Thread.
Polynomial-Time Reducibility
-
There is some evidence to show that `P=NP` is unlikely.
- Further many problems have been shown to be in NP.
- So it is useful to be able to classify which NP problem are easy and which are hard.
- To do this, we say a language `L_1` is polynomial-time reducible to language `L_2`,
written `L_1 le_P L_2`, if there exists a polynomial time computable function
`f:{0,1}^star -> {0,1}^star` such that for all `x in {0,1}^star`, `x in L_1` iff `f(x) in L_2`.
Lemma. If `L_1`, `L_2` are languages such that
`L_1 le_P L_2` and `L_2` is in `P`, then `L_1` is in `P`.
Proof. Let `A(y)` decide `L_2` in time `O(p(|y|))`.
Let `f(x)` be a `O(q(|x|))`-time reduction from `L_1` to `L_2`.
Here `p` and `q` are polynomials. Then `B(x)` which first computes `f(x)`
then runs `A(f(x))`, runs in `O(p(q(|x|))`-time and decides `L_1`.
So `B` runs in polynomial time.
NP-completeness
- The `P` languages in `NP` are the easy languages.
- In contrast, a language `L` is called `NP`-complete if
- `L in NP`, and
- `L' le_P L` for every `L' in NP`.
- A language which satisfies (2) but not necessarily (1) is called `NP`-hard.
- Let `NPC` denote the class of `NP`-complete languages.
Theorem. If any `NP`-complete language is in `P`, then `P=NP`.
Proof. This follows from the lemma on the last slide.
A First NP-complete problem
Let CIRCUIT-SAT be the language:
`{langle C rangle | C` is an AND, OR, NOT circuit computing a
0-1 function which on some truth assignment to its
input variables outputs 1`}`
Theorem. CIRCUIT-SAT is in NP.
Proof. Consider the following algorithm
`A(langle C rangle, langle a rangle)`. First, `A` checks if `langle C rangle`
is in the format of a circuit and `langle a rangle` is in the
format for an assignment; if not, it rejects
`A`. Otherwise, it then labels each of the inputs to `langle C rangle` with
their value according to their values in `langle a rangle`. Then it
loops over the combinational elements in `langle C rangle`, until
there is no change doing the following:
- Check if the current element is not assigned a value,
but its children have been assigned a value.
- Calculate the value of the node
based on its gate type and its children.
By the `i`th iteration the nodes of depth `i` will have values.
Each iteration involves less than quadratic work. So in `O((|langle C rangle|)^3)`
this algorithm labels the root of the circuit with its
output value on this assignment. Finally, CIRCUIT-SAT is the language
`{langle C rangle in {0,1}^star : exists langle a rangle,
|langle a rangle| le |langle C rangle| mbox( and ) A(langle C rangle, langle a rangle) = 1}`.
Cook's Theorem
Theorem. CIRCUIT-SAT is NP-hard.
Proof. Let `L` be a language in `NP`,
let `A(x,y)` verify the language in time `O(|x|^c)`.
The algorithm `A` runs on some kind of computational
hardware. If that hardware is in a given configuration
`c_i` then its control determines in the next time step
what its next configuration `c_(i+1)` will be. We assume
that this mapping can be computed by some AND, OR, NOT
circuit `M` implementing the computer hardware. Using
this circuit `M`, we build an AND, OR, NOT circuit
`langle C(y) rangle` which is split into main layers
which have the properties:
- The output of `C` at main layer 1 codes, `c_0` , a configuration
of `M` at the start of the computation of `A(x,y)`. Here the values
of `x` are hard-coded based on the instance `x` which we are trying
to check is in `L`. `y` is not hard-coded and boolean variables
are used to represent it.
- For each `i`, the output of `C` at main layer `i + 1`, corresponds
to the configuration obtained from main layer `i` by computing
according to `M`.
- The output of `C` is the value extracted from the final
configuration of `A` after `O(|x|^c)` steps.
Since there are
polynomially many main layers each separated by
polynomial-sized circuits, this whole circuit will be polynomial-size.
If there is some setting of the boolean variables for `y` which
makes the circuit true, then `A(x,y)` holds and `x` will be in `L` as desired.
NP-completeness Proofs
- In general, most `NP`-completeness proof will make use of the following lemma:
Lemma. If some `NP`-complete language reduces to a language `L`,
then `L` is `NP`-hard. If `L` is further in `NP` then `L` will be NP-complete.
Proof. Just compose the reductions.
Some NP-complete Problems
- Let SAT = `{langle F rangle | langle F rangle` is a satisfiable boolean formula `}`
- Let 3SAT = `{langle F rangle | langle F rangle` is a satisfiable CNF formula
where each clause has at most three literal `}`.
Theorem. Both SAT and 3SAT are `NP`-complete.
Proof. First both languages are in `NP` by the same
argument that showed CIRCUIT-SAT in `NP`. Given an instance
`langle C rangle` of CIRCUIT-SAT, let gate `i` be coded as
`langle i, type, j, k rangle`. Here type is AND, OR, NOT, or input,
and `j, k < i` are gates which are inputs to this gate. A `0` for `j`
or `k` means that argument is not used. Let `c_i`'s be new variables
other than the input variables `x_j`. Recall the symbol `<=>` is true
if both its boolean inputs have the same value. For each gate we create
a boolean formula either of the form `c_i <=> (c_j mbox( type ) c_k)`,
where type is replaced with AND or OR; or of the form
`c_i <=> (mbox( type ) c_j)` in the case of NOT or an input (in
the input case type is nothing). The SAT formula we output
on input `langle C rangle` is the conjunction of all such defining
formulas conjuncted with `c_w`, where `w` is the last gate in the
formula. The idea is if `c_w` is true, then its defining equation
`c_w <=>...` must be true and this propagates back to some
setting of the leaves which will make the circuit true. By
rewriting each `c_i <=> (c_j mbox( type ) c_k)` formulas in
3CNF we can make this whole formula into 3CNF. We can
pad clauses with less than 3 literals with dummy variables
to make all clauses the same size.