Nondeterministic Greedy MIS

Last week, we were talking about the Maximal Independent Set (MIS) Problem -- the problem of finding a largest set of vertices in a graph none of which share an edge.
We gave a greedy, linear in the number of edges, single processor algorithm to find the lexicographically first MIS.
We said if we could get an NC algorithm for lexicographically first MIS, then P=NC and so all polynomial time algorithms would admit parallelizable ones.
Our first result today is to get very close to this.
Consider the following variant of Greedy MIS, where, we after setting `I = emptyset` in the for loop, we do:
```
1. Pick any vertex v
2. Add v to I 
3. Delete v and Gamma(v) from the graph.
```
Choosing `v` to be the lowest numbered vertex present in the graph is the Greedy MIS algorithm of last day.
The basic idea of our parallel algorithm is to generalize this to find an independent set `S`, add `S` to `I` and delete `S` and `Gamma(S)`.
We want to choose an independent set such that `S cup Gamma(S)` is large to keep the number of iterations small.
To do this we ensure the number of edges incident to `S cup Gamma(S)` is a large fraction of the total remaining edges.
To find such an `S`, we pick a large random set of vertices `R` contained in `V`. `R` won't usually be independent. If we bias the sampling in favor of vertices with low degree, we can hope that few will have both endpoints in `R`. For those edges which have both endpoints in `R`, we delete the one of lower degree. This gives an independent set.

Parallel MIS

Input: G=(V,E)
Output: A maximal independent set I contained in V
1. I := emptyset
2. Repeat {
   a) For all v in V do in parallel
         If d(v) = 0 then add v to I and delete v from V.
         else mark v with probability 1/(2d(v)).
   b) For all (u,v) in E do in parallel
         if both u and v are marked
             then unmark the lower degree vertex.
   c) For all v in V do in parallel
         if v is marked then add v to S
   d) I := I union S
   e) Delete S union Gamma(S) from V and all incident edges from E 
   } Until V is empty.

Analysis of Parallel MIS

The algorithm of the last slide and the analysis we give are due to (Luby 1986).
Each iteration of the above takes `O(log n)` time on an EREW PRAM with `O(n+m)` processors.
We want to bound the number of iterations we do.
Call a vertex `v` good if it has at least `(d(v))/3` neighbors of degree no more than `d(v)`; otherwise, the vertex is bad. An edge is good if one of its endpoints is good and is bad otherwise.
A good vertex is quite likely to have one of its lower degree neighbors in S and so is likely to be deleted from `V`.
We argue that the number of good edges is large, and since good edges are likely to be deleted, a large number of edges will be deleted each iteration.

More Analysis of Parallel MIS

Lemma*. Let `v` in `V` be a good vertex with degree `d(v) > 0`. Then, the probability that some vertex `w in Gamma(v)` gets marks is at least `1- exp(-1/6)`.

Proof. Each vertex `w in Gamma(v)` is marked independently with probability `1/(2d(w))`. Since `v` is good, there exist `(d(v))/3` vertices in `Gamma(v)` with degree at most `d(v)`. Each of these is marked with probability at least `1/(2d(v))`. Thus, the probability none of these neighbors is marked is at most: `(1 - 1/(2d(v)))^((d(v))/3) le e^((-1)/6)`.

Here we are using that `(1 + a/n)^n <= e^(a)` and that the remaining neighbors of `v` can only help increase the probability under consideration.

Yet More Analysis of Parallel MIS

Lemma**. During any iteration, if a vertex `w` is marked then it is selected to be in `S` with probability at least `1/2`.

Proof. The only reason a marked vertex `w` becomes unmarked and hence not selected for `S` is if one of its neighbors of degree at least `d(w)` is also marked. Each such neighbor is marked with probability at most `1/(2d(w))`, and the number of such neighbors is at most `d(w)`. Hence, we get the probability that a marked vertex is selected to be in `S` is at least:
`1 - Pr{exists x in Gamma(w) mbox( such that ) d(x) ge d(w) mbox( and x is marked )}`
`ge 1 - |{x in Gamma(w)| d(x) ge d(w)}| times 1/(2d(w))`
`ge 1 - sum_(x in Gamma(w))1/(2(d(w))`
`= 1 - d(w) times 1/(2(d(w))`
`= 1/2`

Even More Analysis of Parallel MIS

Lemma#. The probability that a good vertex belongs to `S cup Gamma(S)` is at least `(1- exp(-1/6))/2`.

Proof. Let `v` be a good vertex with `d(v) > 0`, and consider the event `E` that some vertex in `Gamma(v)` does get marked. Let `w` be the lowest numbered marked vertex in `Gamma(v)`. By Lemma **, `w` is in `S` with probability at least `1/2`. But if `w` is in `S`, then `v` belongs `S cup Gamma(S)` as `v` is a neighbor of `w`. By Lemma *, the event `E` happens with probability `1- exp(-1/6)`. So the probability `v` is in `S cup Gamma(S)` is thus `(1- exp(-1/6))/2`.

Still More Analysis of Parallel MIS

Lemma## In a graph `G=(V,E)`, the number of good edges is at least `|E|/2`.

Proof. Our original graph was undirected. Direct the edges in `E` from the lower degree-point to the higher degree endpoint, breaking ties arbitrarily. Let `d_i(v)` be the indegree of `v` and `d_o(v)` be the out-degree. From the definition of goodness, we have for each bad vertex:
`d_o(v) - d_i(v) ge (d(v))/3 = (d_o(v) + d_i(v) )/3`
For all `S`, `T` contained in `V`, define the subset of the edges `E(S,T)` as those edges directed from vertices in `S` to vertices in `T`; further, let `e(S,T) = |E(S,T)|`. Let `V_G` and `V_B` be the sets of good and bad vertices respectively. The total degree of the bad vertices is given by:
`2e(V_B, V_B) + e(V_B, V_G) + e(V_G, V_B)`
`= sum_(v in V_B) (d_o(v) + d_i(v))`
`le 3 sum_(v in V_B)(d_o(v) - d_i(v))`
`= 3 sum_(v in V_G)(d_i(v) - d_o(v))`
`= 3[(e(V_B, V_G) + e(V_G, V_G)) - (e(V_G, V_B) + e(V_G, V_G))]`
`= 3[e(V_B, V_G) - e(V_G, V_B)]`
`le 3[e(V_B, V_G) + e(V_G, V_B)]`
The first and last expressions in this sequence of inequalities imply that
`e(V_B,V_B) <= e(V_B,V_G) + e(V_G,V_B)`.
Since every bad edge contributes to the left side, and only good edges to the right side, the result follows.

Finishing up Parallel MIS

Theorem. The Parallel MIS algorithm has an EREW PRAM implementation running in expected time `O(log^2 n)` using `O(n+m)` processors.

Proof. Notice each round is `O(log n)` time on `O(n+m)` processors. Since a constant fraction of the edges are incident on good vertices and good vertices get eliminated with a constant probability, it follows that the expected number of edges eliminated during an iteration is a constant fraction of the current set of edges. So after `O(log n)` iteration we will have gotten down to the empty set. QED

Remark. By using pairwise independence rather than full independence in the above analysis one can show only `O(log n)` random bits are needed for the algorithm. From this one can derandomize the above algorithm to get an NC algorithm.

Quiz

Which of the following statements is true?

The processors in the CREW PRAM model can only read from global memory registers, they are not allowed to write to them.
Our BoxSort always has the same runtime regardless of the random bit values used.
Two maximal independent sets for the same graph might have different sizes.

Distributed Algorithms

In the PRAM model all of the processors used the same clock so all the processors' computation steps were in sync.
We will now consider the situation where we have n processors each with its own clock. So a step on one processor might be longer or shorter than some other processor.
There is a global memory consisting of `m` registers.
As several processors might attempt to simultaneously read/modify a register, we will assume before a processor accesses a global register it must first get a lock for it.
While it has the lock of a register, no other processor can access that register and must wait.
When the lock is released, all waiting processors are notified and can contend for the lock again.

The Choice Coordination Problem

Memorably Disgusting Motivating story:
- Have mites which survive by making colonies in the ears of moths.
- If they infect both ears of the same moth, the moth can't hear bat sonar calls, and it and the mite colonies will be eaten.
- How can the mites agree on only one ear to infect?
We will be interested in a computer science variation on this problem called the Choice Coordination Problem.
We will have `n` processors in our distributed setting.
We want them to agree on a value between `1` and `m`.
We will know an agreement has been met at the point when exactly one of the registers our processors have access to contains a special symbol #.
An `Omega(n^(1/3))` time deterministic lower bound is known for this problem in the distributed model.
We will show there is a expected constant time randomized algorithm.

Warm-Up Algorithm

We first consider the simplified case of only two processors and in which these processors are synchronized.
Let `P[i]` denote a processor, and `C[i]` denote its choice.
Finally, let `B[i]` be a value which is local to processor `P[i]` only.

Synch-CCP:
Input: Registers C[0] and C[1] initialized to 0. 
Output: Exactly one of the register has the value #.
0 P[i] is initially scanning the register C[i] and 
  has its local variable B[i] initialized to 0.
1 Read the current register and obtain a bit R[i].
2 Select one of three cases: 
   (a) case [R[i] = #]: halt;
   (b) case [R[i] = 0, B[i] = 1]: write # into the current register and halt;
   (c) case [otherwise]: assign an unbiased random bit to B[i] 
       and write B[i] into the current register.
3 P[i] exchanges its current register with P[1-i] and returns to step 1.

Analysis of Synch-CCP

Let's look at the correctness of Synch-CCP:
- First notice, at most one register can ever have # written into it. Why? If both registers get the same value `#` then by 2(a) they must have both written `#` in the same iteration. Suppose this happens on the `k`th iteration. Let `B[i]_(k)` and `R[i]_(k)` denote the values used by `P[i]` just after Step 1 of the `k`th iteration. I.e., Just before case 2(b) could be applied in the `k`th round. The previous round must have used case 2(c), so we know `R[0]_(k) = B[1]_(k)` and `R[1]_(k) = B[0]_(k)`. The only way a `#` could be written is if `R[i] =0` and `B[i]=1;` but then `R[1-i] = 1` and `B[1-i]=0`, so `P[1-i]` can't write `#` in that iteration.
Notice during each iteration, the probability that both `B[i]` have the same value is a `1/2`. If the two bits are ever different then within two stages the algorithm stops. So after `k` steps the odds the algorithm has not stop is `O(1/2^k)`.
So with odds `1-O(1/(2^k))` the algorithm terminate in `k` steps.

The Asynchronous Problem

We now assume the two processors may be executing at varying speeds and cannot exchange the registers after each iteration.
We no longer assume that the two processors begin by scanning different registers.
We assume that each processor chooses its starting register at random.
The two processors could be in a conflict at the very first step so locking needs to be used.
To do coordination we want to use timestamps.
We will assume a read on `C[i]` will yield a pair `( t[i], R[i] )` where `t[i]` is the timestamp and `R[i]` is the register.

Asynchronous-CCP

Input: Registers C[0] and C[1] initialized to (0,0).
Output: Exactly one of the two registers has value #.
0. P[i] is initially scanning a randomly chosen register. 
   Thereafter, it changes its current register at the end of each iteration. 
   The local variables T[i] and B[i] are initialized to 0.
1. P[i] obtains a lock on the current register and reads (t[i],R[i]).
2. P[i] selects one of five cases:
   (a) case [R[i] = #]: halt;
   (b) case [T[i] < t[i]]: set T[i] = t[i] and B[i] = R[i]
   (c) case [T[i] > t[i]]: write # into the current register and halt;
   (d) case [T[i] = t[i], R[i] = 0, B[i] = 1]: 
       write # into the current register and halt
   (e) case [otherwise]: Set T[i] = T[i] + 1 and t[i] = t[i] + 1, 
       assign a random bit-value to B[i] , 
       and write (t[i] , B[i]) to the current register.
3. P[i] releases the lock on its current register, 
   moves to the other register, and returns to Step 1.

Analysis of Asynchronous-CCP

Theorem. For any `c gt 0`, Asynchronous-CPP has total cost in operations executed exceeding `c` with probability at most `2^(-Omega(c))`.

Proof. The main difference between this and the synchronous case is in Step 2 (b) and 2 (c):

Case 2(b) is supposed to handle where the processor is playing catch up with the other processor.
Case 2(c) handles where the processor is ahead of the other processor.

To prove correctness of the protocol, we consider the two cases (2(c), 2(d)) where a processor can write a # to its current cell. At the end of an iteration, a processor's timestamp `T[i]` will equal the timestamp of the current register `t[i]`. Further # cannot be written in the first iteration by either processor. (Proof cont'd next slide).

Proof cont'd

Suppose `P[i]` has just entered case 2(c), with some timestamp `T^star[i]`, and its current cell is `C[i]` with timestamp `t^star[i] lt T[i]^star`. The only possible problem is that `P[1-i]` might write `#` into register `C[1-i]` . Suppose this error occurs, and let `t^star[1-i]` and `T^star[1-i]` be the timestamp during the iteration for the other processor.

As `P[i]` comes to `C[i]` with a timestamp of `T^star[i]` , it must have left `C[1-i]` with a timestamp before `P[1-i]` could write `#` into it. Since timestamps don't decrease `t^star[1-i] ge T^star[i]`. Further `P[1-i]` cannot have its timestamp `T^star[1-i]` exceed `t^star[i]` since it must go to `C[1-i]` from `C[i]` and the timestamp of that register never exceeds `t^star[i]`. So we have `T^star[1-i] le t^star[i] lt T^star[i] le t^star[1-i]`. This means `P[1-i]` must enter case 2(b) as `T^star[1-i] lt t^star[1-i]`. This contradicts it being able to write a #.

We can analyze the case P[i] has just entered case 2(d) in a similar way, except we would reach the conclusion that `T[1-i] le t^star[i] = T[i] le t^star[1-i]` and so it is possible that `T[1-i] = t^star[1-i]`. But if this event happens, we are in the synchronous situation so our earlier correctness argument works. Thus, we have established correctness of the algorithm.

Runtime of Asynchronous-CCP

We now just need to analyze the runtime of our above algorithm.
The cost is proportional to the largest timestamp.
Only case 2(e) increase the timestamp of a register, and this only happens if case 2(d) does not apply.
Furthermore, the processor that raises the timestamp must have its current `B[i]` value chosen during a visit to the other register. So the analysis of the synchronous case applies.

Finish Parallel MIS - Distributed Algorithms

Outline