Finish Map Reduce and PRAMs, Online Algorithms




CS255

Chris Pollett

Mar 13, 2018

Outline

Introduction

Simulating PRAMs via MapReduce

Theorem. Any CREW PRAM algorithm using `O(n^(2 - 2 epsilon))` total memory, `O(n^(2 - 2epsilon))` processors and `t(n)` time can be run in `O(t)` rounds in `DMRC`.

Proof. To simulate each processor used in the PRAM at a time step `t`, we will use a reducer, `rho_1^t`. To simulate each each memory location used by the PRAM at time step `t`, we will have another reducer `rho_2^t`. Mappers will be used to route memory requests and ship relevant memory bits to the reducer responsible for the particular processor. Each processor computing `rho_1^t` will then perform one step of computation of the PRAMs assigned to it, write out memory updates, and request new memory positions. The process then repeats.

In a normal CREW PRAM algorithm, it is possible that in a given step not all memory location are written to. We can modify such an algorithm however to have dummy processors for each memory location that in a given time step read that location and then write the same value back. We assume these dummy processors have higher index value that any of the original processors and that if there is ever two writes of the same location the one of lower index value wins. In our map reduce program, we want to keep track of memory address values as ordered pairs (a,v) that we pass around from one round to the next.

In more detail, at time `t` for the original PRAM algorithm let `b_i^t` denote the (address, value) pairs that processor `i` reads from. Write `b_i^t = emptyset` if processor `i` does not perform a read. Let `w_i^t` be the (address, value) pairs that `i` writes to at time `t`, and set `w_i^t = emptyset` if it does not write. By induction, we assume that at the start of time `t`, reducer `rho_1^t` has as its inputs tuples `(i; b_i^t)`. (We have an initialization mapper `mu_0^0` that sets this up for time `0`). On such a pair `rho_1^t` simulates one step of the computation for processor `i` and outputs `(i; r_i^(t+1), w_i^t)`. Here `r_i^(t+1)` is the read request for the next time step, and `w_i^t` is what `i` wrote to in time `t`.

The mapper `mu_1^t` takes as inputs `(i; r_i^(t+1), w_i^t)` and makes two kinds of output pairs: `(r^(t+1), i)`, a pair indicating what processor `i` wants to read; and `(a; w_i^t, i)`, a pair where `a` is the address coded in `w_i^t = (a, v)` and `v` is the value assigned, indicating the update to `a`'s memory.

The reducer `rho_2^t` takes two kinds of pairs as input `(a_j; (a_j, v_j), i)` representing the new value for `a_j` is `v_j`. Since the PRAM is a CREW, and since it was modified as described above, we only get exactly one such pair per memory address `a_j` of minimal `i`. The second type of input is of the form `(a_j; i')`. This represents that processor `i'` would like to get the value of address `a_j`. `rho_2^t` fulfills these requests by outputting `(a_j; (a_j, v_j), i')` for each requesting `i'`. Remember `rho_2^t` is not allowed to change keys.

Finally, however, mapper `mu_2^t` takes as input tuples of the form `(a_j; (a_j, v_j), i')` and outputs tuples `(i'; a_j, v_j)` so we are ready for the next round.

In-Class Exercise

Online Algorithms

The Online Paging Problem

Common Online Algorithms for the Paging Problem

An Offline Algorithm and Competitiveness

Definition. A deterministic online paging algorithm `A` is said to be be `C`-competitive if there exists a constant `b` such that on every sequence of requests `R=(r_1, r_2, ...)`,
`f_A(R) - C times f_O(R) <= b`
where the constant `b` must be independent of `N` (total requests) but may depend on `k` (cache size). Let `C_A`, the competitiveness coefficient of `A`, be the infimum of `C` such that `A` is `C`-competitive.

`k+1`-Item example

A Lower Bound on Competitiveness

Definition. A deterministic online paging algorithm is an automaton `(S, I, s_0, c_0, F)`, where `S` is a finite set of states, `I` is a set of items, `s_0` is the start state, `c_0 in I^k` is the initial state of the cache, and `F: S times I^k times I -> S times I^k` maps a current state `s`, a tuple of cache contents `c in I^k`, and an item request `r in I` to a new state `s'` and tuple of cache contents `c'` such that for some `0 <= i <= k` the `i`th component of `c'` is `r`.

Remark. So after `m` requests in the request sequence `R=(r_1, ..., r_N)`, `F( ... F(F(s_0, c_0, r_1), r_2)..., r_m)` will be a pair `(s_m, c_m)` where `s_m` is the state and `c_m in I^k` is the cache output by the applications of `F`.

Theorem. Let `A` be a deterministic online paging algorithm. Then `C_A >= k`.

Proof. Let our set of items `I` have `k+1` items. Consider the request sequence `R= (r_1, ... r_N)` such that `r_i` is always the item in `I` that was not listed in `c_(m-1)`. `r_i` always exists since `I` has `k+1`-items, and this sequence `R` will cause `A` to miss every cache request. So `A` will have `N` cache misses; whereas, from the previous slide we know the offline algorithm MIN has at most `N/k` misses. Hence, the result follows.