Introduction

Recall a language `L` was a set of strings over an alphabet, `A`, and a decision procedure for `L` is a procedure, which when given a string `x` outputs "Yes", if `x in L`, and "NO", otherwise.
`NP` is the class of languages that have polynomial time verification algorithms. That is, there is a p-time algorithm `A(x,y)` and a polynomial `q` for each `L in NP` such that `x in L` iff `exists y \leq q(|x|)[A(x,y) = 1]`.
We have been studying the hardest languages in `NP`, the `NP`-complete languages (NPC). `L in NPC` if `L in NP`, and for any `L' in NP` then is a `p`-time function `f` such that `x in L'` iff `f(x) in L`.
Thousands of problems have been shown to be `NP` complete: scheduling problems, fault detection in circuits, program optimization, clustering, deadlock avoidance, minesweeper, etc.
In this class, we have shown so far CIRCUIT-SAT, SAT, 3-SAT, CLIQUE, and VERTEX COVER are `NP`-complete.
We showed CIRCUIT-SAT was in `NPC` by directly showing how to reduce an arbitrary language in `NP` to CIRCUIT SAT.
For all the other results, we showed how to reduce a problem which we know is in `NPC` to the target language we are trying to show is in `NPC`.
Today, we continue to show `NP`-completeness results.

Hamiltonian Cycles

Recall a Hamiltonian cycle is a permutation of the vertices `v_(i_1),..., v_(i_n)` of a graph `G` so that there is an edge between `{v_(i_j) , v_(i_j+1)}` for each `j` as well an edge `{v_(i_n) , v_(i_1)}`.
Let HAM-CYCLE be the language `{langle G rangle | G` contains a Hamiltonian cycle`}`.

Theorem. HAM-CYCLE is `NP`-complete.

Proof. First, given a permutation of the vertices, we can in polynomial time verify whether or not it is a Hamiltonian cycle. So HAM-CYCLE is in `NP`. To see it is `NP`-complete, we show VERTEX-COVER `le_p` HAM-CYCLE. Given a graph `G` and an integer `k`, we need to make a new graph `G'` which has a Hamiltonian cycle iff the original had a vertex cover of size `k`...

More NP-Completeness Proof of HAM-CYCLE

We will make use of the following widget `W_(uv)` to build a new graph `G'` from `G`:

The middle path in our example paths above will be used if `u` and `v` are both in cover of `G`.

For each edge `{u, v}` in the original graph, the graph `G'` contains one copy of the widget `W_(uv)` (i.e, `W_(uv)` and `W_(vu)` are the same widget) and we denote the edges of the widget by `[u, v, i]` or `[v, u, i]` according to if they are on the left or right side. Only the tops and bottoms of widgets will be connected to the rest of the graph `G'`. In our construction, a cycle must visit each widget, and there are exactly three different ways (as shown above) one could visit all the vertices of the widget: start on the left side, the right side, or do the two sides separately. In addition to the vertices of the widgets, we will have selector vertices, `s_1,..., s_k`. The edges chosen in these selector vertices will correspond to the `k` vertices of the vertex cover in `G`. We also have two additional types of edges besides those in the widgets that we describe on the next slide.

Connecting Widgets and Selector Vertices/Edges

For each `u in V` of `G`, let `u^{(i)}` denote the vertices connected to `u` by an edge in `G`. To `G'`, we add edges to form a path containing all widgets corresponding to edges `{u, u^((i))}`. To do this we add the edges:
`{{[``u, u^((i)), 6``], [``u, u^((i+1)), 1``]``} | u in V}`
to `G'`. So we can construct a path from `[u, u^((1)), 1]` to `[u, u^((deg(u))), 6]` using these additional edges.

If both `u` and `u^((i))` are in a vertex cover of `G` then we traverse a widget as
The second kind of additional edges are of the form
`{\{s_i,[u,u^((1)) ,1]\} | u` is in `V` and `1 le i le k} cup`
`qquad {\{s_i, [u,u^((deg(u))) ,6]\} | u` is in `V` and `1 le i le k }`.

Conclusion HAM-CYCLE is NP-Complete

If `G= langle V, E rangle` then notice the size of a widget is constant and we have `|E|` widgets.
We also have only k selector vertices.
Of the additional edges described on the previous slide, there are at most sum of the degrees vertices of the first type.
There are at most `2k|V|` additional edges of the second type.
So in all the new graph `G'` will be polynomial size in `G`.
Suppose `G` has a vertex cover `{u_1,.. u_k}`. A Hamiltonian cycle in `G'` can be obtained by starting at `i=1` and for each `i` thereafter follow `s_i` to `[u_i, u_i^((1)),1]` and then the path from the previous slide to `[u_i, u_i^((deg(u_i))),6]`. Then from there one can follow the edge `{s_(i+1), [u_i, u_i^((deg(u_i))), 6]}`. Finally, one can following the edge `{s_k, [u_k, u_k^((deg(u_k))), 6]}` back to the start.
Since each edge in `G` is incident with one vertex in the vertex cover each widget will have all of its vertices hit by this path if there is cover.
On the other hand, if there is a Hamiltonian cycle in `G'` then
`V^(star) ={ u in V | {s_j, [ u, u^((1)), 1]}` is in the cycle for some `1 le j le k}` will be a vertex cover of size `k` in `G`.

HAM-CYCLE Reduction Example

Give the instance of VERTEX-COVER of graph (a), our reduction will produce the graph (b) (some of the edges to and from selector vertices have not been drawn).
The fact that the lightly-shaded vertices in (a) correspond to a vertex cover is transformed to highlighted path in (b) which is a Hamiltonian cycle.

Traveling Salesman Problem

In this problem a salesman must visit `n` cities. Between each pair of cities `{i, j}` there is a cost `c_(ij)`.
We want to know if it is possible for the salesman to see each city exactly once (except twice for the start city) with cost less than `k`?
TSP = `{langle G, c, k rangle | G` is a complete graph, `c` is the cost matrix, and `k` is an integer such that the traveling salesman has a tour of cost at most `k}`.

Theorem. TSP is `NP`-complete.

Proof. First given a tour we can verify if it satisfies the desired properties in polynomial time. So it is in `NP`. To see completeness we reduce HAM-CYCLE to it. Given an instance `G = langle V, E rangle` of Hamiltonian cycle, we build an instance of TSP as follows: We first let `G'` be the complete graph on the same vertices. Then we set `c_(ij) = 0` if `{i, j}` is in `E` and `c_(ij) = 1` otherwise. Then `langle G', c, 0 rangle` is in TSP iff `G` was in HAM-CYCLE.

SUBSET-SUM

In the subset-sum problem we are given a finite set `S subset NN` and a target `t in NN`. We then ask: Is there a subset `S' subseteq S` whose elements sum to `t`?
For example, if `S = {1,2,7, 14, 49}` and `t=16`, then the subset `S' = {2, 14}` is a solution.
Formally,
SUBSET-SUM = `{langle S, t rangle | exists S' subseteq S, t= sum_(s in S') s}`.
We assume in the framing of this problem that we are encoding the numbers in binary.

NP-Completeness of SUBSET-SUM

Theorem. SUBSET-SUM is `NP`-complete.

Proof. To see SUBSET-SUM is in `NP` notice if we are given an instance `langle S, t rangle` of subset sum and a particular encoding of set of integers `langle S' rangle`, by linear scan for each element of `S'` we can check if it is in `S`. Further, by another scan of `S'` we can compute the sum of the elements in `S'` and then check if they are equal to `t`. This whole procedure would take at most` O(|langle S, t rangle +langle S' rangle|^2)` and so is a polynomial time verification procedure for SUBSET-SUM.

To show SUBSET-SUM is `NP`-hard, i.e., any language in `NP` reduces to it, it suffices to reduce 3SAT to SUBSET-SUM, as we already showed 3SAT is `NP`-complete. Suppose `phi(x_1, ..., x_n)` is an instance of 3SAT with clauses `C_1, ..., C_k`. WLOG, we can assume each clause has exactly three distinct literal, no clause has both a literal and its negation, and each variable appears in at least one clause.

The reduction creates two numbers in set `S` for each `x_i` and two numbers in `S` for each `C_j`. Numbers will be created in base 10, where each number contains `n + k` digits and each digit corresponds to either one variable or one clause. Proof continues next slide...

NP-Completeness of SUBSET-SUM cont'd

As we can see from the above picture we construct `S` and `t` by labeling each digit position by either a variable or a clause. The least significant `k` digits are labeled by clauses, and the most significant `n` digits are labeled by variables. In the picture above `phi = C_1 ^^ C_2 ^^ C_3 ^^ C_4`, where `C_1 = (x_1 vv neg x_2 vv neg x_3)`, `C_2 = (neg x_1 vv neg x_2 vv neg x_3)`, `C_3 = (neg x_1 vv neg x_2 vv x_3)`, and `C_4 = (x_1 vv x_2 vv x_3)`.

The target `t` has a `1` in each digit labeled by a variable and a `4` in each digit labeled by a clause.
For each variable `x_i`, there are two integers `v_i` and `v'_i` in `S`. Each has a `1` in the digit labeled by `x_i` and `0`'s in the other variable digits. If literal `x_i` appears in clause `C_j`, then the digit labeled by `C_j` in `v_i` contains a `1`. If literal `neg x_i` appears in clause `C_j`, then the digit labeled by `C_j` in `v'_i` contains a `1`. All other digits are labeled `0`.
For each clause `C_j`, there are two integers `s_j` and `s'_j` in `S`, Each has `0`'s in all digits other than the one labeled by `C_j`. For `s_j`, there is a `1` in the `C_j` digit, and `s'_j` has a `2` in this digit.

NP-Completeness of SUBSET-SUM Cont'd Some More

The maximum sum of digits in any digit position is at most `6`, so we don't have to worry about carries when we add numbers in `S`. `S` contains `2n + 2k` values each with `n +k` digits, where the time to produce a digit is polynomial in `n+k`, each digit of the target can be computed in constant time, so the whole reduction from `phi` to the `S` described above is `p`-time.

Suppose `phi` is satisfiable. If `x_i = 1` in this assignment include `v_i` in `S'`; otherwise include `v'_i` in `S'`. The sum of the `x_i` digit positions in `S'` will be `1` as we are only including one of the two `v_i`'s and all other `v_j`'s have `0` in the `x_i` digit position.

If we sum a `C_j` digit position from the elements so far added to `S'` we would get either `1`, `2`, or `3` depending on how many variables in the assignment satisfy this clauses. We can then add either `s_j` or `s'_j` or both to `S'` to ensure we get a sum of `4`. Hence, we have shown there exists an `S'` which achieves the target.

Suppose we have have constructed `S` and `t` as above, and there is an `S'` that achieves the target sum. Then to satisfy the `x_i` digit columns we must have exactly one of `v_i` or `v'_i` in `S'`. From which we can get an assignment for `phi`. The fact that the `C_j` column sum targets were achieved will ensure this is a satisfying assignment.

Quiz

Which of the following statements is true?

We showed in class CLIQUE is in `P`.
There exist `NP`-hard languages which are not in `P`.
Our proof that VERTEX-COVER is NP-complete relied on the four-color theorem.

Approximation Algorithms, Performance Ratios

Since it seems hard to find exact solutions to the optimization problems associated with a given `NP`-complete problem, it is natural to ask if one can get approximate solutions in polynomial time?
We say an algorithm for a problem has an approximation ratio of `r(n)`, if for any input of size `n`, the cost `C` of the solution produced by the algorithm is within a factor of `r(n)` of the cost `C^star` of the optimal solution. That is, `max(C/C^star, C^star/C) le r(n)`.
We call an algorithm that achieves an `r(n)`-approximation ratio an `r(n)`-approximation algorithm.
Some `NP`-complete problems have a trade-off between the approximation ratio and the run time.
An approximation scheme for an optimization problem is an algorithm that takes both an instance of the problem as well as a constant `epsilon` and then runs a `(1 + epsilon)`-approximation on the instance.
If for any `epsilon`, the approximation scheme run in `p`-time, then it is called a polynomial time approximation scheme.
We say that an approximation scheme is a fully `p`-time approximation scheme if it is an approximation scheme and its run time is `p`-time in both `1/epsilon` and the instance size `n`. For example, the scheme might have a running time of `O((1/epsilon)^2n^3)`.

The Vertex Cover Problem

The optimization problem associated with VERTEX-COVER is to find the least vertex cover of a instance graph `G`.
The following algorithm takes a graph `G` and outputs a vertex cover within twice the optimal.

APPROX-VERTEX-COVER(G)
1 C=∅
2 E'= E[G]
3 while E' ≠ ∅
4    let {u, v} be an arbitrary edge of E'
5    C = C ∪ {u, v}
6    Remove from E' every edge incident with either u or v
7 return C.

Analysis of APPROX-VERTEX-COVER

Theorem. APPROX-VERTEX-COVER is a p-time 2-approximation algorithm.

Proof. First, the algorithm runs in time `O(|V| +|E|)`, as we delete two vertices and at least one edge each time through the loop.

The set `C` returned by the algorithm is a vertex cover, since each edge that is removed is covered by some vertex in `C`. And the loop continues till no edges left.

To see that the cover returned is at most twice the optimal, let `A` denote the set of edges which were picked in line 4. In order to cover the edges in `A`, any vertex cover (including the optimal `C^star`) must include at least one endpoint of each edge in `A`. No two edges in `A` share an endpoint, so no two edge from `A` are covered by the same vertex from `C^star`. So `|C^star | ge |A|`. On the other hand `|C| = 2|A|`.

Approximating the Traveling Salesman Problem

The optimization problem associated with TSP is to find a tour of least cost.
Here is a 2-approximation algorithm for this problem when the triangle inequality holds on the distances between cities.

APPROX-TSP-TOUR(G, c)
1. Select a vertex r to be a root vertex
2. Compute the minimal spanning tree for G from root r using Prim's algorithm
3. Let L be the list of vertices visited in a pre-order tree walk of T
4. return the Hamiltonian cycle H that visits the vertices in order L.

Subroutines used by our algorithm

Recall in a pre-order traversal of a graph starting from some node, we visit each child we have not yet visited, and then visit the current node.
Recall Prims algorithm contructs a minimal spanning tree from a tree so far, denoted `A`, which at the start of the algorithm is the empty tree.
We maintain a priority queue of all the vertices not in A.
The priority, `v.key`, for a vertex `v` in the queue is the least weight of any edge connecting `v` with `A`. If no such edge exists than it is `infty`.
Let `v.pi` be the parent of `v` in the tree. Rather than explicitly have an `A` we use this parent structure to get the tree when the algorithm terminates.

Here is the pseudo-code:

MST-PRIM(G, w, r) // r is a starting node to grow the tree from
01 for each u in G.V
02    u.key = infty
03    u.pi = NIL
04 r.key = 0
05 r.pi = 0;
06 Q = MAKE-QUEUE(G.V) //will have all vertices
07 while Q != 0
08     u = EXTRACT-MIN(Q)
09     for each v in G.adj[u]
10        if v in Q  and u.key + w(u, v) < v.key
11            v.pi = u
12            v.key = u.key + w(u,v) //call appropriate DECREASE-KEY

Analysis of APPROX-TSP-TOUR

Theorem. APPROX-TSP-TOUR is a p-time 2-approximation algorithm for TSP with triangle-inequality holding on the cost function.

Proof. The minimal spanning tree algorithm runs in time `O(|V|^2)`. The remaining step take at most `O(|G|)` time.

Let `H^star` denote the optimal tour of the vertices. Since we can obtain a spanning tree from any tour by deleting an edge, we have `c(T) le c(H^star)` where `T` is our minimal spanning tree. A full walk `F` of `T` lists the vertices when they are first visited and also whenever they are returned to after a visit to a subtree. So `c(F) = 2c(T) le 2c(H^star)`. A full walk is typically not a tour since it lists some vertices twice.

On the other, the `H` returned by the algorithm is a tour and satisfies `c(H) le c(F)`, since it is obtained by deleting vertices from the full walk and since the triangle inequality holds. We are using the triangle inequality as if we have a sequence `a b c` in the full walk and delete `b`, our tour we want that the cost does not rise.

HAM-CYCLE, TSP and SUBSET-SUM are NPC, Approximation Algorithms

Outline