Introduction

Last week, we introduced the Map Reduce model, gave a couple examples of how it might be used (video recoding, word counts), and said a little bit about practicalities of deployment (combiners, fault tolerance during map and reduce phases).
Today, we are going to start going over the results of Karloff, Suri, Vassilviskii (SODA 2010) concerning simulating PRAM algorithms in Map Reduce and to some degree vice-versa.
To start let's give a formal definition of a mapper and a reducer.

Mappers and Reducers

Recall a multiset is an unordered collection of objects where repeats are allowed.

Definition. A mapper is a (possibly randomized) function that takes as input one ordered `langle key; value rangle` pair of binary strings. As output the mapper produces a finite multiset of new `langle key; value rangle` pairs.

Definition. A reducer is a (possibly randomized) function that takes as input a binary string `k` which is the key, and a sequence of values `v_1, v_2, ...` which are also binary strings. As output, the reducer produces a multiset of pairs of binary strings `langle k; v_(k,1)rangle , langle k; v_(k,2)rangle, langle k; v_(k,3)rangle, ...` The key in the output tuples is identical to the key in the input tuple.

So we allow mappers to manipulate keys arbitrarily, but reducers cannot change the keys at all.

A Map Reduce Program

A map reduce program, `P`, consists of a sequence `langle m[1], r[1], m[2], r[2], ..., m[R], r[R] rangle` of mappers and reducers.
The program input is a multiset of `langle key; value rangle` pairs denoted by `U[0]`.

On input `U[0]`, `P`, executes as follows:

For i = 1, 2, ..., R do:
1. EXECUTE MAP: Feed each pair (k;v) in U[i-1] to mapper m[i], and run it. 
   This generates a sequence (k[1]; v[1]), (k[2]; v[2]),... Let U'[i]
   be the multiset of (key; values) pairs output by m[i], that is 
   U'[i] = union over (k;v) in U[i-1] of m[i]((k;v)).
2. SHUFFLE: For each k, let V[k][i] be the multiset of values v[j]
   such that (k;v[j]) is in U'[i]. I.e., in this step we compute the
   array V[k][i] 
3. EXECUTE REDUCE: For each k, feed k and some arbitrary permutation
   of V[k][i] to a separate instance of reducer r[i] and run it.
   The reducer will generate a sequence of tuples (k; v'[1]), (k; v'[2]),...
   Let U[i] be the multiset of (key; value) pairs output by r[i]. That is,
   U[i] = union over k of r[i]((k; V[k][i]))

The computation halts after the last reducer halts.
The point of this set up is that it makes parallelism easy:
- Since each mapper `m[i]` only operates on one tuple at a time, the system can have many instances of of `m[i]` operating on different tuples in `U[i-1]` in parallel.
- After mapping, the system partitions the tuple output based on their key.
- Since the reducer `r[i]` only operates on one part of this partition, the system can have many instances of `r[i]` running on different parts in parallel.

Quiz

Which of the following is true?

Our algorithm for Byzantine agreement had expected runtime that depended on the number of servers.
Our Asynchronous-CPP algorithm made use of timestamps.
In map reduce, a combiner is another name for a process that executes a shuffle step.

Assumptions of the Model

Map Reduce jobs are used when we have severe restrictions on memory, processing, and time. Let's consider these briefly before we define the class MRC.

Memory
We assume the input of the whole map reduce program is too big to fit into memory on any single machine. That is, we require that the input to any mapper or reducer be sublinear in the size of the data. This prevents the model from being trivial where we just have a single mapper and reducer on the same machine.

Machines
An algorithm that require `n^3` machines in the size `n` of the web would not be practical. So we assume the number of machines is sublinear in the data size.

Time
We do not restrict the power of an individual reducer, but we require that both the map and the reduce functions run in time polynomial in the original input length. We are also more interested in programs that require a small number of map reduce rounds, because shuffling is a time consuming operation.

The Map Reduce Class.

Given a program input, a sequence of pairs `(k[j], v[j])`, for `j=1,2,3,...` where `k[j]` and `v[j]` are binary strings, we define the length of this input to be `n = sum_j(|k[j]| + |v[j]|)`, where `|a|` denotes the length of the binary string `a`.

Definition. Fix an `epsilon > 0`. An algorithm in `MRC^k` consists of a sequence `langle m[1], r[1], m[2], r[2], ..., m[R], r[R] rangle` of operations which outputs the correct answer with probability at least `3/4` where:

Each `m[i]` is a randomized mapper implemented by a RAM with `O(log n)` length words, that uses `O(n^(1-epsilon))` space and time polynomial in `n`.
Each `r[i]` is a randomized reducer implemented by a RAM with `O(log n)`-length words, that uses `O(n^(1-epsilon))` space and time polynomial in `n`.
The total space `sum_((k;v) in U'[i])(|k| +|v|)` used by `(key; value)` pairs output by `m[i]` is `O(n^(2-2epsilon))`
The number of rounds `R= O(log^k n)`.

We define `MRC = cup_k MRC^k` and we define `DMRC` in an analogous fashion where we require our machines and the above operations to be deterministic.

One key thing to note is we allow the mappers and reducers to run in time polynomial in `n` not polynomial in the length of the input they receive.

DMRC is in P

Theorem. Languages in DMRC can be decided by RAMs running in polynomial time and using at most `O(n^2 log n)` space.

Proof. The idea is that we just want to compute all of the map reduce steps on a single machine. Note each mapper or reducer from the definition runs in at most polynomial time and uses at most sublinear in `n` space. We require the space used by these machines in round `i` to be sub-linear in the original input, not in the output of round `i-1`. In a given round the total output is sub-quadratically many key value pairs. Let `p(n)` be a polynomial bound the running of any mapper or reducer. So we could run each mapper on a single machine in a serial fashion, get their total outputs and use those to run each reducer serially on these outputs to generate the input for the next round. To simulate a single round would take time `O(n^2 cdot p(n))`, simulating all rounds would take time `O(log^k n cdot n^2 cdot p(n))`. We only need to keep the previous rounds output in memory at an given time, so we get the space bound.

Connections between NC and DMRC

The paper shows the following result using a padding argument on a version of the CIRCUIT VALUE PROBLEM. We skip the proof but state the result:

Theorem. If `P ne NC` then `DMRC` is not contained in `NC`.

Map Reduce and PRAMs

Outline