Outline
- `NP`-complete
- In-Class Exercise
- Cook-Levin
Introduction
- We introduced the complexity class `NP`.
- `NP` was the class of decision problems that have a polynomial time computable verifier.
- We then said a problem `D` was hard for `NP` if any problem in `NP` could be polynomial time reduced to `D`.
- We said `D` was `NP`-complete if it was hard for `NP` and also happened to be in `NP`.
- So far though we have not shown there are any problems which are actually `NP`-complete.
- We start today by rectifying this situation.
`NP`-complete problems
Theorem.
`mbox(TMSAT) := {\langle alpha, w, 1^n, 1^t rangle |`
`\mbox(there exists a ) u in {0,1}^n mbox( such that ) M_alpha mbox( accepts w, u in at most t steps) }`
is Karp-complete for NP.
Proof. We first show `mbox(TMSAT) ` is in `NP`. A nondeterministic algorithm to recognize this language is as follows: On input `\langle alpha, w, 1^n, 1^t rangle`, nondeterministically guess a string `u` of length `n` and then simulate `M_alpha` according to this string for `t` steps and see if it accepts `langle w, u rangle`. To see this language is hard for `NP`, suppose `L` is an `NP` language. Then there is some verifier `M` such that `x in L` iff there is a `u in {0,1}^(p(n))` satisfying `M(x,u) = 1` and `M` runs in time `q(n)` for some polynomial `q`. To reduce `L` to `mbox(TMSAT)`, we simply map every string `x in {0, 1}^star` to `langle lfloor M rfloor, x, 1^(p(|x|)), 1^(q(m)) rangle`, where `m = |x| + p(|x|)`. This mapping is computable in `p`-time and
`langle lfloor M rfloor, x, 1^(p(|x|)), 1^(q(m)) rangle in mbox(TMSAT) iff `
`exists u in {0,1}^(p(n))mbox( such that ) M_alpha mbox( accepts w, u in at most ) q(m) mbox( steps ) iff`
`x in L`.
In-class Exercise
Below is a problem to machine learning. Argue that it is in NP. It's actually NP-complete (Brucker 1978), but you don't have to show completeness.
Clustering: Given a finite set `X`, a distance function `d(x,y)` which returns nonnegative integers for any inputs `x,y in X`, and two positive integers `k` and `B`, is there a partition of `X` into disjoint sets `X_1, ..., X_k` such that, for `1 \leq i \leq k` and all pairs `x,y in X_i`, `d(x, y) \leq B`?
Post your solution to the Feb 22 Class Thread.
Boolean Formulas
- TMSAT is not a problem that people are asked to do every day.
- We next want to look at the first problem that was actually shown to be `NP`-complete: SAT.
- This example comes from propositional logic is perhaps more natural. To introduce it we need to introduce the notion of a Boolean formula.
- Given a set of variables `u_1`, ..., `u_n` whose values can be `0` (false) or `1` (true), a boolean formula
is either just one of these variables or is built from these variables using AND (`^^`), OR (`vv`), or NOT (`neg`).
- A truth assignment `nu:[1, .. n] -> {0,1}` gives a value to `0` or `1` to the set of variables. So `nu(1) = 1`, also written `nu(u_1) = 1`, says that variable `u_1` has the value `1`. A truth assignment `nu` for variables can be extended in the natural way to a function `bar{nu}` which gives a `0` or `1` value to any boolean formula.
- For example, under the assignment `nu(u_1) = 1`, `nu(u_2) = 0`, `nu(u_3) = 1`, `bar{nu}(u_1 ^^ u_2)` evaluates to `0`, and `bar{nu}((u_1 ^^ u_2) vv u_3)` evaluates to 1.
Satifiability, Unsatisfiability, and Validity
- A formula is said to be satisfiable if some assignment to its input variables makes the formula output `1`.
- Otherwise, the formula is said to be unsatisfiable.
- A formula is said to be valid if all assignments to its input variables makes the formula output `1`.
- For example, `(u_1 ^^ u_2) vv u_3` is satisfiable, `(u_1 ^^ neg u_1)` is unsatisfiable, and `(u_1 vv neg u_1)` is valid.
- Notice if a formula is unsatisfiable then its negation is valid. One automated theorem proving technique is to take a statement and try to see if its negation has a formal refutation (i.e., its negation can be proven unsatisfiable).
Conjunctive Normal Form
- A literal, `l_i`, is used to mean either a variable `u_i` or its negation `neg u_i`. We often write `neg u_i` as `bar u_i`.
- A formula is said to be in conjunctive normal form, (CNF), if it is AND of ORs of variables or their negation.
- For example, `(u_1 vv bar u_2 vv u_3) ^^ (u_2 vv bar u_3 vv u_4) ^^ (bar u_1 vv u_2 vv bar u_4)`
- We often write CNF formulas like `^^_i(vv_j nu_(i_j))`
- `vv_j nu_(i_j)` are called clauses.
- Clauses are sometimes written as sets. So a clause like `(u_2 vv bar u_3 vv u_4)` would be written as `{u_2, bar u_3, u_4}`. A formula such as
`(u_1 vv bar u_2 vv u_3) ^^ (u_2 vv bar u_3 vv u_4) ^^ (bar u_1 vv u_2 vv bar u_4)` would be written as a set of clause sets,
`{{u_1, bar u_2, u_3}, {u_2, bar u_3, u_4}, {bar u_1, u_2 , bar u_4}}`.
- A kCNF formula is a CNF formula in which all clauses have at most `k` literals.
- We denote by SAT the language of all satisfiable CNF formulas, and by 3SAT the language of all satisfiable 3CNF formulas.
The Cook-Levin Theorem (1971, 1973)
Theorem.
(1) SAT is `NP`-complete.
(2) 3SAT is `NP`-complete.
Proof. First notice, given a truth assignment, we can check in polynomial time is each clause in a CNF is satisfied or not. So both
SAT and 3SAT are in NP. So it suffice to show they are hard for `NP`. We will prove this over the next couple of slides.
CNFs are universal
Claim. For every Boolean function `f:{0, 1}^l -> {0,1}`, there is an `l`-variable CNF formula `phi` of size `l2^l` such that
`phi(u) = f(u)` for every truth assignment `u in {0, 1}^l`. Here the size of a CNF formula is defined to be the number of `^^`'s/`vv`'s appearing in it.
Proof. For each `v in {0,1}^l` we make a clause `C_v(u_1, .., u_l)` where `u_i` appears negated in the clause if bit `i` of `v` is `1` otherwise it appear un-negated. Notice this clause has `l` ORs. Also notice `C_v(v) = 0` and `C_v(u) =1` for `u ne v`. Using these `C_v`'s we can define a CNF formula for `f` as:
`phi = ^^_(v:f(v) = 0) C_v(u_1, .. u_l)`.
As there are at most `2^l` strings `u` which make `f(u)=0`, the total size of this CNF will be `l 2^l`.
Example of Converting to CNF
- Consider the statement `(A wedge B) \vee (bar(A) wedge bar(B))\vee (A wedge neg B)`. This expresses `A ge B`.
- We could express `A = max(A, B, C)` as `A ge B wedge A ge C`.
- This is an AND of ORs of ANDs as written, so not CNF.
- Its truth table looks:
A | B | C | A = max(A, B, C) |
1 | 1 | 1 | 1 |
1 | 1 | 0 | 1 |
1 | 0 | 1 | 1 |
1 | 0 | 0 | 1 |
0 | 1 | 1 | 0 |
0 | 1 | 0 | 0 |
0 | 0 | 1 | 0 |
0 | 0 | 0 | 1 |
- To make a CNF, we look at the three false rows. If the variable `X` for a column has a 1 in it we take that variable, if it has a 0 we take `bar(X)`. So the row 0, 1, 0, becomes `A vee bar(B) vee C`.
- This formula asserts that row didn't happen.
- So the CNF for the whole formula is:
`(A vee bar(B) vee bar(C)) wedge (A vee bar(B) vee C) wedge (A vee B vee bar(C))`
- As a collection of clauses we would write:
`{{A, bar(B), bar(C)}, {A, bar(B), C}, {A, B, bar(C)}}.`
The NP-hardness of SAT is to be continued next day....