We were going over a sorting algorithm for parallel random access machines (PRAMs) due to Reischuk 1985 called BoxSort.
It ran using linearly many processors in time `O(log n)` with high probability.
We will briefly give a sketch of this.
There are links on last days lecture notes to the original article since I think its pretty well written and contains some of the proofs we glossed over.
Above is a picture of the relationships between various parallel and non-parallel complexity classes, we've considered.
In the above, `P` represents what languages can be recognized in polynomial time on a single processor.
Let's start today by reviewing the BoxSort algorithm.
BoxSort - Review
Pick `n^(1/2)` elements at random from our array of `n` elements to sort. Then using `n` processors sort them in `O(log n)` steps.
Each random element require `O(log n)` coin flips to generate (can do in parallel).
We can imagine having an array `R` giving the indices of the randomly selected elements.
After sorting, the array `R` has its indices rearranged so they give the selected elements in ascending order.
To do the sorting we imagine for each `i` of the `sqrt(n)` many element of `R`, using `sqrt(n)-1` processors to compare it with the other `sqrt(n)-1` elements, then using `O(log n)` time to sum the values of these comparisons to determine the number of elements smaller than `i`.
Then using these sorted elements insert the remaining elements among them in `O(log n)` steps:
Notice, using binary search, a processor `i` can determine between which two splitters pointed to by `R`, `A_i` should go in `O(log n)` time.
So it can output a bit `b_i` saying whether location `i` is less than its splitter's location as pointed to by the index in `R` or not.
We can then compute sums `S_i` of these bits in parallel as with the QuickSort case and move in the same fashion.
Treat the remaining elements that are inserted between splitter as subproblems, recur on each subproblem whose size exceed `log n`,
otherwise, use LogSort:
Compare each element in parallel with its neighbor first to its left, swap if necessary; then to its right, swap if necessary, do this O(log n) times.
In-Class Exercise
How many different types sorting subroutines does BoxSort involve?
For the first of the subtypes used to sort `sqrt(n)` splitter elements, show how would it sort `6,2,8,7`? (Show step by step.)
Doing splitting should take `O(log mbox((the size of box we need to split)))`.
In a perfect world, at each level the expected size of the box goes down by a square root, so we get the sum
`O(log n + log n^(1/2) + log n^(1/4) +...) = O(log n +1/2log n +1/4logn +...) = `
`O((log n) cdot (1+ 1/2+1/4+ ...)) = O(log n)`
We will argue even in a non perfect world that with high probability the sum of the log of the sizes of the boxes along any path is `O(log n)`, so the runtime will be `O(log n)`.
Analysis of Splitting
To see the intuition of the last slide is true, partition the interval `[1,n]` into sub-intervals `I_0, I_1, ...`
We will then bound the probability that a box whose size is in `I_k` has a child whose size is also in `I_k`.
Fix `gamma` and `d` such that `1/2 lt gamma lt 1` and `1 lt d lt 1/gamma`. For positive integers `k`,
let `tau_k=d^k`, `rho_k= n^(gamma^k)`. Define `I_k=[rho_(k+1), rho_k]`.
`n = (log n)^{(log_(log n)2)log n }`. So if `gamma^k lt 1/ (log n log_(log n)2)`, then `rho_k= n^( gamma^k) lt log n`. This will happen for some `k lt c log log n`.
So we will only be interested in `O(log log n)` many intervals `I_k`.
For a box `B` in the tree, we let `alpha(B)=k` if `|B|` is in `I_k`.
In terms of our notation, the time to split Box `B` is `O(log rho_(alpha(B)))` .
For a root-leaf path, `P = (B_1, ..., B_t)`, the runtime is given by `sum_(j=1)^t log rho_(alpha(B_j))`.
The total runtime of the algorithm will be O of this plus log n (to sort the leaves).
Define the event `E_P` to be that the sequence `alpha(B_1), ..., alpha(B_t)` does not contain the value `k` more than `tau_k` times for
`1 le k le c log log n`.
If `E_P` holds then the number of PRAM steps on path `P` will be:
`O(log n + sum_(k=1)^infty log tau_k gamma^k log n))`
The End of Sorting
Since `tau_k=d^k` and `d cdot gamma lt 1`, this sums to `O(log n)`.
So it suffices to show `E_P` happens with high probability. Lemma. There is a constant `b gt 1` such that `E_P` holds with probability `1- exp(-log^b n)`. Proof The proof of this is given as a sequence of exercises in the book which we omit. It makes use of Exercise 12.6 that follows from Chernoff bounds. (Chernoff Bounds: If `X` is the sum of independent random variables which outputs either 0 and 1, the latter with probability `p`, then for a `0 le theta le 1`, `Pr{X ge (1+theta)pn} lt e^(-(theta^2 p n)/3)`).
From this we can conclude: Theorem. There is a constant `b gt 1` such that probability at least `1-exp(-log^b n)`, the algorithm BoxSort terminates in `O(log n)` steps.
So although in a bad case it might take longer, with high probability this is a `O(log n)` time algorithm.
Maximal Independent Set
Let `G = (V,E)` be an undirected graph with `n` vertices and `m = Omega(n)` edges. A subset `I` of `V` is said to be
independent in `G` if no edge in `E` has both ends in `I`.
Equivalently, if `Gamma(v)` is the set of vertices connected to `v`, then `I` is independent if for all `v in I`, `Gamma(v) cap I = emptyset`.
An independent set is maximal if it is not a proper subset of another independent set in `G`.
The red nodes and the blues nodes in the graph above are two different maximal independent sets in the same graph. Notice the blue set has more nodes.
The problem of finding a maximum independent set (the independent set with the most nodes) is NP-hard.
In contrast the finding a maximal independent set is `O(m)` time:
Greedy MIS:
Input: Graph G(V,E) with V = {1,..,n}
Output A maximal I contained in V.
1. I := emptyset
2. For v=1 to n do
3. If Gamma(v) intersect I = emptyset then I := I union {v}.
Analysis of Greedy MIS
Greedy-MIS is very sequential in nature.
For the graph on the last slide the algorithm outputs the Maximal Independent Set (MIS) {1,3,6}.
Notice the two other independent sets we had previously drawn are {1,5} and {3, 4, 6}. According to dictionary (lexicographical) order {1, 3, 6} is before {1,5} is before {3, 4, 6}.
It turns out Greedy-MIS always outputs the lexicographically first MIS (LFMIS).
LFMIS is a P-complete problem (with respect to log-time poly- processor PRAM reductions) (Cook 1985).
So it is known that an NC algorithm for LFMIS would imply P=NC. (This is an open problem. In English, it asks does every poly-time algorithm have a good parallel one?)
We will describe an RNC algorithm for MIS and later show how to derandomize it to an NC algorithm.
The maximal set we output won't typically be the lexicographically first one.