Introduction

Last week, we started talking about sorting in linear time.
We said that the variants of sorting we had considered up to this point where all O(n log n).
For example, merge sort, heap sort, quicksort.
Each of these is also a comparison sort, that is, the sorted order determined by the algorithm is only based on comparisons between the input elements.
In class, I, not-on-slides, described radix sort which was a sort that uses other kinds of comparisons rather than comparing whole elements to determine the final sorted order. This and other sorts we consider today will all be linear time.
To start today though, we will look at why comparison sorts cannot be linear time.

Lower Bounds for Comparison Sorting

Over the next couple of slides we are going to show an `Omega(n log n)` lower bound for comparison sorting.
To start notice in comparison sorting a sequence `langle a_1, a_2, ..., a_n rangle` we perform one of the tests `a_i < a_j`, `a_i le a_j`, `a_i = a_j`, `a_i ge a_j` or `a_i > a_j` to determine their relative order
We may not inspect the values in order to gain information about them in any other way.
For our lower bound we will assume all items are distinct so comparison of the form `a_i = a_j` are not needed.
Further, given this assumption, it suffices to assume all comparisons are of of the `a_i le a_j`.

Decision Trees

We frame the problem of comparison sorting inputs in terms of decision trees.
A decision tree is a full binary tree (all non-leaves have two children) that represents the comparisons between elements that are performed by a particular sorting algorithm operating on an input of a given size. I.e., Fix a size of array, say `n =100`, from the comparison sort we get a decision tree that says the comparisons made to sort any array of 100 elements. We would get a different (presumably smaller) tree if `n=50`. Paths from the root to a leaf in the tree represent comparisons made for a particular input sequence.
Control, data movement, and all other aspects of the algorithm are ignored.
The image above is an example decision tree for insertion sorting three elements.
Let `a_1, ..., a_n` denote an input to the tree.
Internal nodes of a decision tree are labeled `i:j` for some `1 le i,j le n`. This indicates we compared element `a_i` and `a_j` of the input.
Leaves of the decision tree are labeled with permutations `langle pi(1), pi(2), ..., pi(n) rangle`, such that `a_(pi(1)) le a_(pi(2)) le ... le a_(pi(n))`.
For example, suppose we had the input 6, 8, 5 to the above decision tree. i.e., `a_1 = 6, a_2 =8, a_3 =5`. The darkened path indicates the comparisons insertion sort would make on this input. The leaf is labelled `langle 3, 1, 2 rangle` which says, `a_3 le a_1 le a_2`, that is, `5 le 6 le 8`.

Worst case lower bound

Any correct sorting algorithm must be able to produce each permutation of its inputs.
So its decision tree must have at least `n!` leaves.
Furthermore each of these leaves must be reachable from the root by a downward path corresponding to an actual execution of the comparisons.
The length of the longest simple path from the root of a decision tree to any of its reachable leaves represents the worst-case comparisons that the corresponding sorting algorithm performs.
Hence, the worst-case number of comparisons for a given comparison sort algorithms equals the height of its decision tree.
Consider a decision tree of height `h` with `l` reachable leaves corresponding to a comparison sort on `n` elements. We must have `n! le l` as each input appears as some leaf. Since a binary tree of height `h` has no more than `2^h` leaves, we have
`n! le l le 2^h`,
taking logarithms, implies
`h ge log (n!) = sum_(i=1)^n log i = Omega(n log n)`
We have thus established:
Theorem. Any comparison sort requires `Omega(n log n)` comparisons in the worst case.

Quiz (Sec 5)

Which of the following statements is true?

The worst-case recurrence for quicksort determined in class was `T(n) = T(n-1) + Theta(n)`.
The best-case recurrence for quicksort determined in class was `T(n) = 2T(n/2) + Theta(log n)`
For some inputs the random variable `X_(ij)` used in our quicksort analysis could take on values greater than `1`.

Quiz (Sec 6)

Which of the following statements is true?

It is possible that on some inputs quicksort and randomized-quicksort as presented in class give different outputs.
In our quicksort analysis
`E[X_(ij)] = 1 cdot Pr{z_i \text( is compared to )z_j} + 0 cdot Pr{z_i \text( is not compared to )z_j}
` `= Pr{z_i \text( is first pivot chosen from )Z_(ij)} + Pr{z_j \text( is first pivot chosen from )Z_(ij)}`
`= 1/(j - i + 1) + 1/(j - i + 1)`
`= 2/(j - i + 1)`.
The quicksort PARTITION procedure is `Omega(n log n)`.

Counting Sort

We now start looking at non-comparison sorts.
Counting sort assumes that each of the `n` input elements is an integer in the range `0` to `k` for some integer `k`.
When `k = O(n)`, the sort runs in `Theta(n)` time.
The idea is for each input element `x` we compute the number of elements less than or equal to `x`. Then we use this information to place element `x` directly into its position in the output array.
For example, if 17 elements are less than `x`, then `x` belongs in position 18. (We have to be careful if not all of the elements distinct)

Counting Sort - Pseudocode

Assume the input is `A[1..n]`. We use two other arrays `B[1..n]` for the sorted output and `C[0,..k]` for temporary storage. Recall `k` is the maximum value of any of the `A[i]`'s.

COUNTING-SORT(A, B, k)
 1 let C[0..k] be a new array
 2 for i = 0 to k
 3     C[i] = 0
 4 for j = 1 to A.length
 5     C[A[j]] = C[A[j]] + 1
 6 // C[i] now contains the number of elements equal to i
 7 for i = 1 to k
 8     C[i] = C[i] + C[i - 1]
 9 // C[i] now contains the number of elements less than or equal to i
10 for j = A.length downto 1
11     B[C[A[j]]] = A[j]
12     C[A[j]] = C[A[j]] - 1

Line 12 is to handle the case where the items might nor be distinct. Decrementing `C[A[j]]` causes the next input element with a value less than or equal to `A[j]` to go to the position immediately before `A[j]` in the output array.
The for loop of line 2-3 take time `Theta(k)`, the for loop of lines 4-5 takes time `Theta(n)`, the for loop of lines 7-8 takes time `Theta(k)`, finally the for loop of lines 10-12 takes time `Theta(n)`.
Thus the overall times is `Theta(k + n)`. In particular if `k = O(n)`, then the run time is `Theta(n)`.

Properties of Counting Sort

Notice counting sort makes no comparisons between input elements. Counting sort instead uses the actual values of the elements to index into an array. So it is not a comparison sort. So the `Omega(n log n)` lower bound does not apply.
Counting sort is an example of a stable sort: numbers with the same value appear in the output array in the same order as they do in the input array.
This is useful if you have satellite data that is being carried around with the element being sorted.
It is also important for proving the correctness of radix sort which often makes use of counting sort as a subroutine.

Radix Sort

Radix sort is the algorithm typically used by old-fashioned card-sorting machines.
A classic punch cards has 80 columns, and in each column a machine can punch a hole in one of 12 places.
The sorter can be mechanically "programmed" to examine a given column of each card in a deck and distribute the card into one of 12 bins (for decimal digits like above, each column only uses 10 places) depending on which place has been punched.
An operator can then gather the cards bin by bin, so that cards, with the first place punched are on top of cards with the second place and so on.
In radix sort, we first sort the least significant digit using the above operation. We take the combined output and then set our mechanical sorter to sort the next-to-least significant digit, and we continue through 80 passes until we sort the most significant digit. At which point the array will be sorted.
Obviously the same idea could be used if had some number of `d` columns other than 80.
Notice sorting a single column can be viewed as executing counting sort. For the whole procedure to work we need that counting sort is stable.

Example and Pseudo-code

The above shows a three column example of using radix sort to sort 3-digit decimal numbers. We sort using counting sort first the right-most, least significant column, then the middle column, then the left, most significant column.
Radix sort is often used to sort information keyed by multiple fields, for example, year, month, and day.

Pseudo-code for radix sort is:

RADIX_SORT(A,d)
1 for i = 1 to d
2     use a stable sort to sort array A on digit i

Correctness and Runtime

Lemma. Given `n` d-digit numbers in which each digit can take on up to `k` possible values, RADIX-SORT correctly sorts these numbers in `Theta(d(n+k))` time if the stable sort it uses takes `Theta(n + k)` time.

Proof. The correctness of radix sort follows by induction on the columns being sorted. The induction hypothesis is that after the `m`th stable sort the numbers are in sorted order according to their least `m` significant digits. The analysis of the running time depends on the stable sort used as the intermediate sorting algorithm. When each digit is in the range `0` to `k-1`, and `k` is not too large, counting sort is the obvious choice. Each pass over `n` `d`-digit numbers then takes time `Theta(n+k)`. There are `d` passes, and so the total time for radix sort is `Theta(d(n + k))`. QED

When `d` is constant and `k=O(n)`, radix is linear time.

Making Things Binary

Typically, we store things in the computer in binary. How does this use of binary interplay with the base/alphabet size 10, 12, 26 of the `d`-letter strings/`d`-digit numbers we are sorting?

Lemma. Given `n` `b`-bit numbers and any positive `r le b`, RADIX-SORT correctly sorts these numbers in `Theta((b/r)(n+2^r))` time if the stable sort it uses takes `Theta(n +k)` time for inputs in the range `0` to `k`.

Here we can view `r` as being the number of bits needed to represent a digit in some other base than binary.

Proof. For a value `r le b`, we view each key as having `d = |~b/r~|` digits of `r` bits each. Each digit is an integer in the range `0` to `2^(r) - 1`, so we can use counting sort with `k = 2^r -1`. Each pass of counting sort takes time `Theta(n + k) = Theta(n + 2^r)` and there are `d` passes, for a total running time of `Theta(d(n+2^r))= Theta((b/r)(n+2^r))`. QED

Linear Time Search

Outline