Simulating PRAMs via MapReduce

Theorem. Any CREW PRAM algorithm using `O(n^(2 - 2 epsilon))` total memory, `O(n^(2 - 2epsilon))` processors and `t(n)` time can be run in `O(t)` rounds in `DMRC`.

Proof. To simulate each processor used in the PRAM at a time step `t`, we will use a reducer, `rho_1^t`. To simulate each each memory location used by the PRAM at time step `t`, we will have another reducer `rho_2^t`. Mappers will be used to route memory requests and ship relevant memory bits to the reducer responsible for the particular processor. Each processor computing `rho_1^t` will then perform one step of computation of the PRAMs assigned to it, write out memory updates, and request new memory positions. The process then repeats.

In a normal CREW PRAM algorithm, it is possible that in a given step not all memory location are written to. We can modify such an algorithm however to have dummy processors for each memory location that in a given time step read that location and then write the same value back. We assume these dummy processors have higher index value that any of the original processors and that if there is ever two writes of the same location the one of lower index value wins. In our map reduce program, we want to keep track of memory address values as ordered pairs (a,v) that we pass around from one round to the next.

In more detail, at time `t` for the original PRAM algorithm let `b_i^t` denote the (address, value) pairs that processor `i` reads from. Write `b_i^t = emptyset` if processor `i` does not perform a read. Let `w_i^t` be the (address, value) pairs that `i` writes to at time `t`, and set `w_i^t = emptyset` if it does not write. By induction, we assume that at the start of time `t`, reducer `rho_1^t` has as its inputs tuples `(i; b_i^t)`. (We have an initialization mapper `mu_0^0` that sets this up for time `0`). On such a pair `rho_1^t` simulates one step of the computation for processor `i` and outputs `(i; r_i^(t+1), w_i^t)`. Here `r_i^(t+1)` is the read request for the next time step, and `w_i^t` is what `i` wrote to in time `t`.

The mapper `mu_1^t` takes as inputs `(i; r_i^(t+1), w_i^t)` and makes two kinds of output pairs: `(r^(t+1), i)`, a pair indicating what processor `i` wants to read; and `(a; w_i^t, i)`, a pair where `a` is the address coded in `w_i^t = (a, v)` and `v` is the value assigned, indicating the update to `a`'s memory.

The reducer `rho_2^t` takes two kinds of pairs as input `(a_j; (a_j, v_j), i)` representing the new value for `a_j` is `v_j`. Since the PRAM is a CREW, and since it was modified as described above, we only get exactly one such pair per memory address `a_j` of minimal `i`. The second type of input is of the form `(a_j; i')`. This represents that processor `i'` would like to get the value of address `a_j`. `rho_2^t` fulfills these requests by outputting `(a_j; (a_j, v_j), i')` for each requesting `i'`. Remember `rho_2^t` is not allowed to change keys.

Finally, however, mapper `mu_2^t` takes as input tuples of the form `(a_j; (a_j, v_j), i')` and outputs tuples `(i'; a_j, v_j)` so we are ready for the next round.

In-Class Exercise

How many map reduce rounds are needed to simulate a 20 step PRAM computation?
Our description didn't say how accumulators should be handle. Propose a method to handle them.
What would simulating the command LoadProcid k look like?
In a given timestep `t` are all PRAM processors doing the same instruction? If not, then what's happening in the simulation?
Post your solutions to the Mar 13 In-Class Exercise Thread.

Online Algorithms

We next look at online algorithms, algorithms which receive and process the input in partial amounts
We imagine the online algorithm receives a sequence of requests for service.
It must process each request before it receives the next one.
In servicing the request, the algorithm has a choice among several alternatives, each with an associated cost.
The alternative chosen at a step might influence the costs of alternatives on future requests.
To measure the performance of an online algorithm we will compute the ratio of the total cost of the online algorithm on a sequence of request, to the total cost of an offline algorithm that services the same sequence of requests. This kind of analysis is called competitive analysis.

The Online Paging Problem

Consider a computer memory organized as a two-level store: there is a cache with fast memory that can store `k` items and a slower main memory that can hold an arbitrarily large number of items.
Each item represents a page of virtual memory.
We have a sequence of requests, each of which specifies a memory item.
If the item is in the cache, we have a cache hit and no cost is incurred. If not, we have a cache miss, and the item must be fetched into memory at a unit cost, and, in addition, one of the `k` items currently in memory must be removed to make room for this new item.
The paging problem is to decide who to evict at each step in processing a sequence `R=(r_1, r_2, ...)` of such requests.
A paging algorithm is an algorithm to solve the paging problem. Such algorithms typically use some kind of eviction rule.
A paging algorithm is online if it can't access the sequence ahead of the item `r_i` it is currently processing; an offline algorithm can operate on the whole sequence `S`.

Common Online Algorithms for the Paging Problem

Least Recently Used (LRU): evict the item whose most recent request occurred furthest in the past.
First-in, First Out: evict the item that has been in the cache for the longest period.
Least Frequently Used (LFU): evict the item in the cache that has been requested least often.

An Offline Algorithm and Competitiveness

Suppose we have a deterministic online paging algorithm `A` and a sequence of request `R=(r_1, r_2, ...)`.
Let `f_A(R) = f_A(r_1, r_2, ...)` denote the number of misses it makes on this sequence.
Let MIN denote the following offline algorithm: On a miss evict the item in memory whose next request occurs furthest in the future.
Let `f_O(R) = f_O(r_1, r_2, ...)` denote the minimum number of misses that an offline algorithm could make on this sequence.
Belady (1966) and Mattson, Gecsei, Slutz, and Traiger (1970) have shown that MIN achieves this bound.

Definition. A deterministic online paging algorithm `A` is said to be be `C`-competitive if there exists a constant `b` such that on every sequence of requests `R=(r_1, r_2, ...)`,
`f_A(R) - C times f_O(R) <= b`
where the constant `b` must be independent of `N` (total requests) but may depend on `k` (cache size). Let `C_A`, the competitiveness coefficient of `A`, be the infimum of `C` such that `A` is `C`-competitive.

`k+1`-Item example

We are going to give a lower bound for deterministic online paging algorithms.
To do this, let's restrict our attention to the case with distinct `k+1`-items, and let our request sequence have length `N`.
Suppose MIN has just had a cache miss. Since there are `k` items in the cache, in the next `k-1` steps at least one of these items is not requested. So if we eject the item whose next request is furthest in the future we won't get another cache miss for at least `k` steps.
Since the above was true of an arbitrary cache miss, we have that the number of cache misses for MIN in the worst case is `N/k`.

A Lower Bound on Competitiveness

Definition. A deterministic online paging algorithm is an automaton `(S, I, s_0, c_0, F)`, where `S` is a finite set of states, `I` is a set of items, `s_0` is the start state, `c_0 in I^k` is the initial state of the cache, and `F: S times I^k times I -> S times I^k` maps a current state `s`, a tuple of cache contents `c in I^k`, and an item request `r in I` to a new state `s'` and tuple of cache contents `c'` such that for some `0 <= i <= k` the `i`th component of `c'` is `r`.

Remark. So after `m` requests in the request sequence `R=(r_1, ..., r_N)`, `F( ... F(F(s_0, c_0, r_1), r_2)..., r_m)` will be a pair `(s_m, c_m)` where `s_m` is the state and `c_m in I^k` is the cache output by the applications of `F`.

Theorem. Let `A` be a deterministic online paging algorithm. Then `C_A >= k`.

Proof. Let our set of items `I` have `k+1` items. Consider the request sequence `R= (r_1, ... r_N)` such that `r_i` is always the item in `I` that was not listed in `c_(m-1)`. `r_i` always exists since `I` has `k+1`-items, and this sequence `R` will cause `A` to miss every cache request. So `A` will have `N` cache misses; whereas, from the previous slide we know the offline algorithm MIN has at most `N/k` misses. Hence, the result follows.

Finish Map Reduce and PRAMs, Online Algorithms

Outline

Introduction