PRAM Sorting


Chris Pollett

Feb 21, 2018



Sorting on a PRAM

We now work towards a ZNC algorithm for sorting which run in `O(log n)` time doing a total of `O(n log n)` operations on all processors (Reischuk 1985). Let `P_i` denote the `i`th processor.

We begin by considering a PRAM variant of Quicksort:

  1. If `n=1` stop.
  2. Otherwise, pick a splitter uniformly at random from the `n` input elements
  3. Each processor determines whether its element is bigger or smaller than the splitter.
  4. Let `j` denote the rank of the splitter. If `j` is not in `[n/4, (3n)/4]` the step is declared a failure and we go back to step 1. Otherwise, we move the splitter to processor `P_j`. Each element that is smaller than the splitter is moved to a distinct processor is `P_i` such that `i lt j`. Each element that is larger than the splitter is moved to a distinct processor `P_k` where `k gt j`.
  5. We sort recursively the data in the processors `P_1` through `P_(j-1)` and the data in `P_(j+1)` through `P_(n)`.

Notice this algorithm uses randomness, but it has zero error -- we can detect if we are in a bad case in which case we rerun -- it is in this sense it is a ZNC algorithm for sorting.


In-Class Exercise

Suppose we want to sort the items {3, -1, 5, 6, 2} and have 5 processors. Show the computation steps involved for each processor for each step in the QuickSort algorithm.

Post your solutions to the Feb 21 In-Class Exercise Thread.

Using More Splitters

Suppose we have `n` processors and `n` elements. Suppose the first `r` processors have values in sorted order.

A Good Choice for `r`, a Complete Algorithm (called BoxSort)

  1. If so, we could pick `n^(1/2)` elements at random and then using all `n` processors sort them in `O(log n)` steps.
    • We can imagine having an array `R` giving the indices of the randomly selected elements.
    • After the sorting the array `R` has its indices rearranged so they give the selected elements in ascending order.
    • To do the sorting we imagine for each `i` of the `sqrt(n)` many element of `R`, using `sqrt(n)-1` processors to compare it with the other `sqrt(n)-1` elements, then using `O(log n)` time to sum the values of these comparisons to determine the number of elements smaller than `i`.
  2. Then using these sorted elements insert the remaining elements among them in `O(log n)` steps:
    • Notice, using binary search, a processor `i` can determine between which two splitters pointed to by `R`, `A_i` should go in `O(log n)` time.
    • So it can output a bit `b_i` saying whether location `i` is less than its splitter's location as pointed to by the index in `R` or not.
    • We can then compute sums `S_i` of these bits in parallel as with the QuickSort case and move in the same fashion.
  3. Treat the remaining elements that are inserted between splitter as subproblems, recur on each subproblem whose size exceed `log n`, otherwise, use LogSort:
    • Compare each element in parallel with its neighbor first to its left, swap if necessary; then to its right, swap if necessary, do this O(log n) times.

Intuition on Splitting

Analysis of Splitting

The End of Sorting