Introduction

Last Wednesday, we were talking about online algorithms.
Such algorithms receive a sequence of requests for service, and must process each request in the sequence before servicing the next request in the sequence.
We introduced the paging problem which involved choosing who to evict from a cache on a cache miss when processing a sequence of memory requests.
We looked at a few different online algorithm for this: LRU, FIFO, LFU as well as the offline algorithm MIN (eject the item in memory whose next request occurs furthest in the future.)
We said that MIN achieved the fewest number of misses an offline algorithm could.
We said an online algorithm was C-Competitive if the ratio of the number of misses it made divided by the number the optimal offline algorithm makes is `C` in the limit where the length of request sequence gets arbitrarily long.
We then showed any online algorithm with a cache size of `k` has competitiveness `>= k`.
Today, we start by arguing that LRU with a cache size of `k` is `k`-competitive.

LRU Competitiveness

One can show that LRU and FIFO actually meet this lower bound, but that LFU does not achieve bounded competitiveness. (Sleator and Tarjan, 1985).
We only show the LRU case and we assume the result that MIN is an optimal offline algorithm.

Theorem. Let `k` be the size of our cache. Let `R` be any request sequence, then `F_(LRU)(R) <= k cdot F_(MIN)(R) + k`.

Proof. After the first access, LRU and MIN always have at least the page just accessed in common. Consider a subsequence `T` of `R` not including the first access and during which LRU faults `k` times. Let `p` be the page accessed just before `T`. If LRU faults on the same page twice during `T`, then `T` must contain accesses to at least `k + 1` different pages. (Because `k` page evictions were needed before the same page is least recently used again.) This is also true if LRU faults on `p` during `T`. If neither of these cases occurs, then LRU faults on at least `k` different pages, none of them `p`, during `T`. In any case, MIN must fault at least once during `T`.

Partition `R` into `R_0, R_1, ...` such that `R_0` contains the first access and at most `k` faults by LRU, and `R_i` for `i=1,..., k` contains exactly `k` faults by LRU. On each of the `R_i`'s where `i >=1`, the ratio of LRU faults to MIN faults is at most `k` to `1`. During `R_0`, if LRU faults `k` times, MIN faults at least once. This gives the theorem.

Adversary Models

What happens if we allow the online algorithm to make use of randomness?
To modify our formulation of competitiveness, we need to say what an adversary knows about the request sequence.
For example, an oblivious adversary might know the randomized paging algorithm, `A`, we are using, but has no knowledge of the random choices made during the execution of the algorithm.
An adaptive adversary chooses request `r_(i+1)` after having seen the responses of `A` on `r_1, ... r_i`.
There are two kinds of adaptive adversaries: adaptive offline adversaries, who after have generated a complete sequence `R` in this adaptive way using `A`, then get to show their solution on `R`; and adaptive online adversaries, who must say what they'd do as they are generating the sequence `R`.

Definition. Let `adv` be one of `obl` (oblivious), `aof`, adaptive offline, and `aon`, adaptive online adversaries. A randomized online paging algorithm,` A` is `C`-competitive against adv adversaries if for every sequence of requests `R`,
`E[f_A(R)] - C times f_(adv)(R) <= b`
where `b` is a constant independent of `N`, the length of the request sequence `R`. We write `C_A^(obl)`, `C_A^(aof)`, and `C_A^(aon)` for respectively the oblivious, adaptive offline, and adaptive online competitiveness coefficients, which are the infimum of `C` such that one has `C`-competiveness against the particular adversary.

From the definitions of these adversary types, we have:
`C_A^(obl) <= C_A^(aon) <= C_A^(aof)`.
Let `C^(adv)` denote the lowest competitive coefficient of any paging algorithm. Let `C^(det)` denote this for deterministic algorithms. Then
`C^(obl)<= C^(aon) <= C^(aof) <= C^(det)`.

Paging against an Oblivious Adversary

Theorem (*). Let `R` be a randomized algorithm for paging. Then `C_R^(obl) ge H_k`, where `H_k = sum_(j=1)^k 1/ j` is the `k`th Harmonic number.

Sketch of Proof. It is often easier to reason about deterministic algorithms on inputs chosen from a bad probability distribution then it is to reason about randomized algorithms directly. Fortunately, there is a result related to Von Neumann's minimax theorem for mixed strategy games called Yao's Minimax Principle, which allows us to go from the randomized algorithm setting to the deterministic algorithm on randomly chosen input setting. We will use this result without proof.

Let `P` be a probability distribution for choosing requests, `r_i`. We allow the probability of `r_i` to depend on the probability of `r_1, r_2,...r_(i-1)`. For a deterministic online paging algorithm `A`, define its competitiveness under `P`, `C_A^P` to be the infimum of `C` such that
`E[f_A(r_1, ..., r_n)] - C times E[f_O(r_1,..., r_n)] le b`.

Yao's Minimax Principle implies that
`i\n\f_R C_(R)^(obl) = s\u\p_P i\n\f_A C_A^P`.
So we can give a lower bound on `C_(R)^(obl)` by giving a probability distribution `P` and giving a lower bound on `C_A^P` for any deterministic algorithm `A`... (more next slide)

More proof of Theorem (*)

Suppose we have `k+1` memory items `I={I_1, ..., I_(k+1)}`. Let `N` be the sequence length and assume $N \gg k$. Since `k` of these can be in the cache, only one item needs to be outside the cache at any time. So any paging algorithm only has to say which algorithm item it leaves out of the cache at any point in time.

Suppose we choose a request sequence as follows: for `i gt 1`, request `r_i` is chosen uniformly at random from the `k` items in the set `I - {r_(i-1)}` the first request `r_1` is chosen uniformly at random from all items in `I`.

First, let's consider the offline case. We split the request sequence into rounds. The first round begins with the first request and ends when, for the first time, every item in `I` has been requested at least once. In general, the `m>1` round ends just before the request to the `(k+1)`th distinct item since the start of the round. Using MIN (which we said was the optimal offline algorithm), the offline algorithm incurs one miss per round. The expected length of a round can be analyzed in the same way as the coupon collector problem, so we get `k H_k`.

Now consider the online algorithm `A`. At any point in time, `A` must leave one of the `k+1` items out of the cache. Whenever a request falls on this item, `A` incurs a miss. Every request goes to an item chosen uniformly at random from the `k` items other than the one just requested. So the probability that the item `A` leaves out is requested is `1/k`. Hence, it follows that the expected number of missed per round is `H_k`. Hence proving the theorem.

Quiz

Which of the following statements is true?

Our Byzantine Agreement procedure works even if 1/9 of the processors are faulty.
Map Reduce jobs are never allowed to use random number generators.
MIN is an online paging algorithm where on a cache miss we evict the least recently used item from memory.

The Marker Algorithm (Fiat, et al. 1991)

The Marker algorithm proceeds in a series of rounds.
Each of the `k` cache locations has a marker bit associated with it.
At the beginning of a round these bits are all set to 0.
When a request comes in, if the request is for an item already in the cache then that item's marker bit is set to 1.
If a request is for an item not in the cache, then the evicted item is chosen uniformly at random from among the unmarked items. The newly brought in item has its location marked 1.
A round ends when all locations become marked.

The Marker Algorithm Competitiveness

Theorem. The Marker algorithm is `(2H_k)`-competitive

Remark. Recall that `H_k` grows as `O(ln k)`, so this is better than the competitiveness achievable with a deterministic algorithm which can be at most `k` competitive.

Proof of Theorem. We will compare the Marker algorithm on a sequence `r_1, r_2, ...` to an optimal offline algorithm on the same sequence. The total number of items might be significantly more than the cache size in our argument. Assume that both algorithms start with the same `k` items in the cache, and that `r_1` is not in the cache. The Marker algorithm implicitly divides the request sequence into rounds, the first of which begins with `r_1`. The round beginning with request `r_i` ends with `r_j`, where `j` is the smallest integer such that there are `k+1` distinct items in `r_i, ... r_(j+1)`. I.e., all `k` cache locations are marked at the end of the round. The first request of each round is to an item not currently in the cache.

Call an item stale if it is unmarked, but was marked in the previous round, and clean if it neither stale nor marked. Let `m` be the number of requests to clean items in a round. To get our result we will show that the number of amortized misses by an offline algorithm during a round is `m/2`; whereas, the expected number of misses by the Marker algorithm during a round is `m H_k`.

Let `S_O` denote the set of items in the offline algorithm's cache, and `S_M` denote the set of items in the Marker algorithm's cache. Let `d_I` be the value of `|S_O - S_M |` at the beginning of the round, and `d_F` be the value at the end of the round. Let `M_O` be the number of misses incurred by the offline algorithm during the round.

`M_O ge m - d_I` since at least `m- d_I` of the `m` clean items requested in the round are not in the offline algorithm's cache at the beginning of the round. At the end of the round, all the `k` items in `S_M` at that point are items that were requested during the round. Since `d_F` items in the offline algorithm's cache are not in `S_M`, the offline algroithm has incurred at least `d_F` misses during the round. Thus,
`M_O ge max(m -d_I, d_F) ge (m - d_I + d_F)/2`.
Summing over all rounds gives the amortized miss rate of `m/2`.

Now consider the expected number of misses that Marker makes during a round. Each of the `m` requests to clean items costs Marker a miss. Of the `k-m` requests to stale items, the expected cost of each is the probability that the item is not in the cache. This is maximized when the `m` requests to clean items precede all the `k-m` requests to stale items. For `1 le i le k -m`, the probability that the `i`th request to a stale item is a miss is `m/(k - i +1)`. Summing over `i` shows that the expected cost of Marker is bounded by
`m + m(H_k - H_m) le m H_k`.

Finish Online Algorithms

Outline