Bound the Length of the `L_i`'s

On Monday, we were trying to prove APPROX-SUBSET-SUM was fully p-time approximation algorithm for SUBSET SUM:

APPROX-SUBSET-SUM(S, t, ε)
1 n = |S|
2 L[0] = (0)
3 for i = 1 to n
4     L[i] = MERGE-LISTS(L[i-1], L[i-1] + x[i])
5     L[i] = TRIM(L[i], ε/2n)
6     remove from L[i] every element that is greater than t
7 let zstar be the largest value in L[n]
8 return zstar

We had already shown that it achieves a `1+\epsilon` approximation ratio.
What we had left to show was the the `L_i`'s stay polynomial in length so that the above algorithm will be p-time in `n` and `1/epsilon`.
To see this, note after trimming, successive elements `z` and `z'` of `L_i` must have the relationship `(z')/z > 1 + epsilon/(2n)`. That is, they must differ by a factor of at least `1 + epsilon/(2n)`. So each list contains the value 0, possibly the value 1, and up to `|__ log_(1+epsilon/(2n)) t__|` additional values. So the number of elements in each list `L_i` is at most
`log_(1+epsilon/(2n)) t +2 = (ln t)/(ln(1+epsilon/(2n))) + 2`
`le (2n(1 + epsilon/(2n)) ln t)/(epsilon) + 2` (using `x/(1+x) le ln (1+x)` )
`< (3n ln t)/epsilon +2` ( since we are assuming `0 lt epsilon lt 1`)
which is polynomial in both `n` and `epsilon` as `ln t < n` as it is provided as part of the input.

The Probabilistic Method

The probabilistic method is a technique for showing the existence of combinatorial objects, which in turn can be used in algorithm construction.
There are two common ways to assert the existence of something via knowing something about probability, using either can be said to be using the probabilistic method. As ideas these are:
1. Any random variable assumes at least one value that is no smaller than its expectation, and at least one value that is no greater than its expectation. For example, if we know the average salary of a computer scientist is `\$`20,000, then we know there must be at least one computer scientist with a salary less than or equal to `\$`20,000.
2. If an object chosen randomly from a universe satisfies a property with positive probability, then there must be an object in the universe that satisfies that property. For example, if we know that a ball randomly chosen from a bin is red with probability 1/3, then we know there is a least one red ball.

Finding Cuts in Graphs

Given a graph `G=(V, E)`, a cut is a partition of the graph into two disjoint sets of vertices `A`, `B`.
The size of a cut is the number of edges `(a,b) in E` such that `a in A` and `b in B`. (If the graph had weighted edges then we would sum the weights of the edges).
The problem associated with finding a minimal cut (a cut of smallest size) is connected with finding flows in a network and can be done in polynomial time.
On the other hand, the optimization problem of finding a maximum cut, max-cut problem is known to be `NP`-hard.

Cut-size and the Probabilistic Method

Theorem. For any undirected graph `G=(V, E)` with `n` vertices and `m` edges there is a cut `A`, `B` such that
`|{(u, v) in E | u in A and v in B}| ge m/2`

Proof. Consider the following experiment: For each vertex flip an unbiased coin and if it is heads put the vertex in `A`; otherwise, put the vertex in `B`.

For an edge `(u,v)`, the probability that its end-points are in different sets is 1/2. By linearity of expectation, the expected number of edges with end-points in different sets is thus `m/2`. It follows by the probabilistic method that there must be a partition satisfying the theorem. QED.

Remark. The above experiment essentially gives us an algorithm to find a cut of expected size `m/2`. In general, the probabilistic method will be closely tied with randomized algorithms for constructing objects.

In-Class Exercise

Run the MAX-CUT of the previous slide on the Peterson Graph.
Post your results to the May 9 In-Class Exercise Thread.

Maximum Satisfiability

Recall we have already given a randomized approximation algorithm for MAX-3SAT.
Let's now look at applying the probabilistic method to the general MAX-SAT where we don't have a restriction on the clause size.

Theorem. For any set of `m` clauses, there is a truth assignment for the variables that satisfies at least `m/2` clauses.

Proof. Suppose that each variable is set to TRUE or FALSE independently and equiprobably. Let `Z_i` be the random variable which is `1` if the `i`th clause is satisfied and `0` otherwise. For any clause containing `k` literals, the probability that it is not satisfied by a random assignment is `2^(-k)`. So the probability that a clause with `k` literals is satisfied is `1- 2^(-k) ge 1/2`, implying that `E[Z_i] ge 1/2`. Let `Z= sum_(i=1)^m Z_i`. Then the expected number of satisfied clauses is
`E[Z] = E[sum_(i=1)^m Z_i] = sum_(i=1)^mE[Z_i] ge m/2`.
The result now follows by the probabilistic method.

Remark. The above gives a randomized 2-approximation algorithm for MAX-SAT and if all of the clauses have at least `k` literals, a randomized `1/(1 - 2^(-k))`-approximation algorithm.

Remark. The approximation ratios `r` as described in the Randomized Algorithms book, are `1/r` the approximation ratios in CLRS.

Randomized Rounding

We are now going to work towards a randomized 4/3 approximation algorithm for MAX-SAT.
The idea is to formulate the problem as a integer linear program, solve the linear programming relaxation in `p`-time, and then if a variable `x_i` in this program gets value `0 le v_i le 1`, we assign it 1 with probability `v_i` and 0 otherwise.
This last step is called randomized rounding.

A MAX-SAT instance as a Linear Program

Let `phi = ^^_j C_j` be a MAX-SAT instance.
For each clause `C_j`, let `z_j in {0,1}` be an indicator variable in the integer program to indicate whether or not that clause is satisfied.
For each variable `x_i` in `phi`, let `y_i` be an indicator variable to indicate the value assumed by that variable. So `y_i =1` if the variable `x_i` is set TRUE and `y_i = 0` otherwise.
Let `C_j^+` be the indices of variables that are not negated in clause `C_j`, and `C_j^-` be the indices of negated variables.
Our integer program for MAX-SAT for `phi` is to maximize:
`sum_(j=1)^m z_j`
where
`y_i, z_j in {0,1}` for all `i` and `j`
subject to
`sum_(i in C_j^+) y_i + sum_(i in C_j^-) (1 - y_i) ge z_j` for all `j`
The last clause ensures that a clause is deemed true only if at least one of the literals in it is assigned value 1.

How good is Randomized Rounding?

Suppose we solve the linear programming relaxation of the above program in polynomial time using the ellipsoid method.
Let `hat(y_i)` and `hat(z_i)` be the values of the variables in a solution.
Let `beta_k` denote the function `1 - (1 - 1/k)^k`.
So `beta_k ge 1 - 1/e` for all `k`.

Lemma. Let `C_j` be a clause with `k` literals. The probability that it is satisfied by the randomized rounding is at least `beta_k hat(z)_j`.

Assuming the lemma, then the expected number of variables satisfied by our randomized rounding algorithm is at least `(1 - 1/e)sum_(j)hat(z)_j`. So we have the following theorem:

Theorem. Given an instance of MAX-SAT, the expected number of clauses satisfied by linear program and randomized rounding is at least `(1- 1/e)` times the maximum number of clauses that can be satisfied on that instance.

Proof of Lemma

Since we are focusing on a single clause `C_j`, we may assume without loss of generality that all its variables are un-negated and it has the form `x_1 vv x_2 vv cdots vv x_k`. From our linear program, we have:
`hat(y_1) + cdots + hat(y_k) ge hat(z_j)`.
Clause `C_j` remains unsatisfied by randomized rounding only if every one of the variables `y_i` is rounded to `0`. Since each variable is rounded independently, this occurs with probability `prod_(i=1)^k(1- hat(y_i))`. So we want to show
`1 - prod_(i=1)^k(1- hat(y_i)) ge beta_k hat(z)_j`.
The expression on the left is minimized when `hat(y_i) = hat(z_j)/k` for all `i`. So it suffices to show
`1 - (1 - z/k)^k ge beta_k z` for all positive integers `k` and`0 le z le 1`. Since `f(x) = 1 - (1 - x/k)^k` is a concave function, to show that it is never less than a linear function `g(x) = beta_k x` over the interval `[0,1]`, it suffices to show that the inequality holds at the end points `x = 0` and `x=1`, which it does.

The 4/3-Approximation

Randomized rounding gives us an approximation algorithm which in its tightest form is governed by `beta_k` which is small when `k`, and hence the clause, is small.
On the other hand, our original MAX-SAT algorithm works better as the clauses sizes get larger.
To get the best of both worlds we can run each algorithm and then return the larger number of satisfied clauses.
Let `n_1` be the expected number of clauses satisfied by our randomized assignment algorithm. Let `n_2` be the number of clauses by our randomized rounding approach.
The book shows that `max{n_1, n_2} ge 3/4 sum_j hat(z_j)`.

The Probabilistic Method

Outline

Bound the Length of the `L_i`'s

The Probabilistic Method

Finding Cuts in Graphs

Cut-size and the Probabilistic Method

In-Class Exercise

Maximum Satisfiability

Randomized Rounding

A MAX-SAT instance as a Linear Program

How good is Randomized Rounding?

Proof of Lemma

The 4/3-Approximation