Finish Parallel MIS - Distributed Algorithms




CS255

Chris Pollett

Mar 4, 2019

Outline

Nondeterministic Greedy MIS

Parallel MIS

Input: G=(V,E)
Output: A maximal independent set I contained in V
1. I := emptyset
2. Repeat {
   a) For all v in V do in parallel
         If d(v) = 0 then add v to I and delete v from V.
         else mark v with probability 1/(2d(v)).
   b) For all (u,v) in E do in parallel
         if both u and v are marked
             then unmark the lower degree vertex.
   c) For all v in V do in parallel
         if v is marked then add v to S
   d) I := I union S
   e) Delete S union Gamma(S) from V and all incident edges from E 
   } Until V is empty.

Analysis of Parallel MIS

More Analysis of Parallel MIS

Lemma*. Let `v` in `V` be a good vertex with degree `d(v) > 0`. Then, the probability that some vertex `w in Gamma(v)` gets marks is at least `1- exp(-1/6)`.

Proof. Each vertex `w in Gamma(v)` is marked independently with probability `1/(2d(w))`. Since `v` is good, there exist `(d(v))/3` vertices in `Gamma(v)` with degree at most `d(v)`. Each of these is marked with probability at least `1/(2d(v))`. Thus, the probability none of these neighbors is marked is at most: `(1 - 1/(2d(v)))^((d(v))/3) le e^((-1)/6)`.

Here we are using that `(1 + a/n)^n <= e^(a)` and that the remaining neighbors of `v` can only help increase the probability under consideration.

Yet More Analysis of Parallel MIS

Lemma**. During any iteration, if a vertex `w` is marked then it is selected to be in `S` with probability at least `1/2`.

Proof. The only reason a marked vertex `w` becomes unmarked and hence not selected for `S` is if one of its neighbors of degree at least `d(w)` is also marked. Each such neighbor is marked with probability at most `1/(2d(w))`, and the number of such neighbors is at most `d(w)`. Hence, we get the probability that a marked vertex is selected to be in `S` is at least:
`1 - Pr{exists x in Gamma(w) mbox( such that ) d(x) ge d(w) mbox( and x is marked )}`
`ge 1 - |{x in Gamma(w)| d(x) ge d(w)}| times 1/(2d(w))`
`ge 1 - sum_(x in Gamma(w))1/(2(d(w))`
`= 1 - d(w) times 1/(2(d(w))`
`= 1/2`

Even More Analysis of Parallel MIS

Lemma#. The probability that a good vertex belongs to `S cup Gamma(S)` is at least `(1- exp(-1/6))/2`.

Proof. Let `v` be a good vertex with `d(v) > 0`, and consider the event `E` that some vertex in `Gamma(v)` does get marked. Let `w` be the lowest numbered marked vertex in `Gamma(v)`. By Lemma **, `w` is in `S` with probability at least `1/2`. But if `w` is in `S`, then `v` belongs `S cup Gamma(S)` as `v` is a neighbor of `w`. By Lemma *, the event `E` happens with probability `1- exp(-1/6)`. So the probability `v` is in `S cup Gamma(S)` is thus `(1- exp(-1/6))/2`.

Still More Analysis of Parallel MIS

Lemma## In a graph `G=(V,E)`, the number of good edges is at least `|E|/2`.

Proof. Our original graph was undirected. Direct the edges in `E` from the lower degree-point to the higher degree endpoint, breaking ties arbitrarily. Let `d_i(v)` be the indegree of `v` and `d_o(v)` be the out-degree. From the definition of goodness, we have for each bad vertex:
`d_o(v) - d_i(v) ge (d(v))/3 = (d_o(v) + d_i(v) )/3`
For all `S`, `T` contained in `V`, define the subset of the edges `E(S,T)` as those edges directed from vertices in `S` to vertices in `T`; further, let `e(S,T) = |E(S,T)|`. Let `V_G` and `V_B` be the sets of good and bad vertices respectively. The total degree of the bad vertices is given by:
`2e(V_B, V_B) + e(V_B, V_G) + e(V_G, V_B)`
`= sum_(v in V_B) (d_o(v) + d_i(v))`
`le 3 sum_(v in V_B)(d_o(v) - d_i(v))`
`= 3 sum_(v in V_G)(d_i(v) - d_o(v))`
`= 3[(e(V_B, V_G) + e(V_G, V_G)) - (e(V_G, V_B) + e(V_G, V_G))]`
`= 3[e(V_B, V_G) - e(V_G, V_B)]`
`le 3[e(V_B, V_G) + e(V_G, V_B)]`
The first and last expressions in this sequence of inequalities imply that
`e(V_B,V_B) <= e(V_B,V_G) + e(V_G,V_B)`.
Since every bad edge contributes to the left side, and only good edges to the right side, the result follows.

Finishing up Parallel MIS

Theorem. The Parallel MIS algorithm has an EREW PRAM implementation running in expected time `O(log^2 n)` using `O(n+m)` processors.

Proof. Notice each round is `O(log n)` time on `O(n+m)` processors. Since a constant fraction of the edges are incident on good vertices and good vertices get eliminated with a constant probability, it follows that the expected number of edges eliminated during an iteration is a constant fraction of the current set of edges. So after `O(log n)` iteration we will have gotten down to the empty set. QED

Remark. By using pairwise independence rather than full independence in the above analysis one can show only `O(log n)` random bits are needed for the algorithm. From this one can derandomize the above algorithm to get an NC algorithm.

Quiz

Which of the following statements is true?

  1. The processors in the CREW PRAM model can only read from global memory registers, they are not allowed to write to them.
  2. Our BoxSort always has the same runtime regardless of the random bit values used.
  3. Two maximal independent sets for the same graph might have different sizes.

Distributed Algorithms

The Choice Coordination Problem

Warm-Up Algorithm

Synch-CCP:
Input: Registers C[0] and C[1] initialized to 0. 
Output: Exactly one of the register has the value #.
0 P[i] is initially scanning the register C[i] and 
  has its local variable B[i] initialized to 0.
1 Read the current register and obtain a bit R[i].
2 Select one of three cases: 
   (a) case [R[i] = #]: halt;
   (b) case [R[i] = 0, B[i] = 1]: write # into the current register and halt;
   (c) case [otherwise]: assign an unbiased random bit to B[i] 
       and write B[i] into the current register.
3 P[i] exchanges its current register with P[1-i] and returns to step 1.

Analysis of Synch-CCP

The Asynchronous Problem

Asynchronous-CCP

Input: Registers C[0] and C[1] initialized to (0,0).
Output: Exactly one of the two registers has value #.
0. P[i] is initially scanning a randomly chosen register. 
   Thereafter, it changes its current register at the end of each iteration. 
   The local variables T[i] and B[i] are initialized to 0.
1. P[i] obtains a lock on the current register and reads (t[i],R[i]).
2. P[i] selects one of five cases:
   (a) case [R[i] = #]: halt;
   (b) case [T[i] < t[i]]: set T[i] = t[i] and B[i] = R[i]
   (c) case [T[i] > t[i]]: write # into the current register and halt;
   (d) case [T[i] = t[i], R[i] = 0, B[i] = 1]: 
       write # into the current register and halt
   (e) case [otherwise]: Set T[i] = T[i] + 1 and t[i] = t[i] + 1, 
       assign a random bit-value to B[i] , 
       and write (t[i] , B[i]) to the current register.
3. P[i] releases the lock on its current register, 
   moves to the other register, and returns to Step 1.

Analysis of Asynchronous-CCP

Theorem. For any `c gt 0`, Asynchronous-CPP has total cost in operations executed exceeding `c` with probability at most `2^(-Omega(c))`.

Proof. The main difference between this and the synchronous case is in Step 2 (b) and 2 (c):

To prove correctness of the protocol, we consider the two cases (2(c), 2(d)) where a processor can write a # to its current cell. At the end of an iteration, a processor's timestamp `T[i]` will equal the timestamp of the current register `t[i]`. Further # cannot be written in the first iteration by either processor. (Proof cont'd next slide).

Proof cont'd

Suppose `P[i]` has just entered case 2(c), with some timestamp `T^star[i]`, and its current cell is `C[i]` with timestamp `t^star[i] lt T[i]^star`. The only possible problem is that `P[1-i]` might write `#` into register `C[1-i]` . Suppose this error occurs, and let `t^star[1-i]` and `T^star[1-i]` be the timestamp during the iteration for the other processor.

As `P[i]` comes to `C[i]` with a timestamp of `T^star[i]` , it must have left `C[1-i]` with a timestamp before `P[1-i]` could write `#` into it. Since timestamps don't decrease `t^star[1-i] ge T^star[i]`. Further `P[1-i]` cannot have its timestamp `T^star[1-i]` exceed `t^star[i]` since it must go to `C[1-i]` from `C[i]` and the timestamp of that register never exceeds `t^star[i]`. So we have `T^star[1-i] le t^star[i] lt T^star[i] le t^star[1-i]`. This means `P[1-i]` must enter case 2(b) as `T^star[1-i] lt t^star[1-i]`. This contradicts it being able to write a #.

We can analyze the case P[i] has just entered case 2(d) in a similar way, except we would reach the conclusion that `T[1-i] le t^star[i] = T[i] le t^star[1-i]` and so it is possible that `T[1-i] = t^star[1-i]`. But if this event happens, we are in the synchronous situation so our earlier correctness argument works. Thus, we have established correctness of the algorithm.

Runtime of Asynchronous-CCP