Byzantine Agreement - Map Reduce




CS255

Chris Pollett

Mar 11, 2019

Outline

Randomized Algorithm for Byzantine Agreement

What the ith Processor does (if it is good).

Input: A value for b[i], our current decision choice. 
Output: A decision d[i].
1. vote = b[i].
2  For each round, do
3.    Broadcast vote;
4.    Receive votes from all the other processors.
5.    Set maj = majority (0 or 1) value among the votes cast
6.    Set tally = the number of votes that maj received.
7.    If coin = heads then set threshold = L; else set threshold = H
8.    If tally >= threshold then set vote = maj; else vote = 0
9.    If tally >= G then set d[i] = maj permanently.

Analysis

Quiz

Which of the following statements is true?

  1. We showed in our analysis of the parallel MIS problem that if a vertex was good the odds that one of its neighbors was marked was at least `1- exp(-1/6)`.
  2. Our synchronous CCP protocol made explicit use of timestamps.
  3. We showed it was impossible for case 2 (c) of the Asynchronous CCP protocol to ever occur.

Parallel and Distributed Algorithms so Far

Map Reduce

Example of How Map Reduce Might be Useful

The Basic Framework

Distinct Phases of a MapReduce Job

Example MapReduce Job for Counting

Example map reduce job for counting

Parallelizing Map Reduce

Combiners

Fault Tolerance

Mappers and Reducers - (Formal Definition)

Recall a multiset is an unordered collection of objects where repeats are allowed.

Definition. A mapper is a (possibly randomized) function that takes as input one ordered `langle key; value rangle` pair of binary strings. As output the mapper produces a finite multiset of new `langle key; value rangle` pairs.

Definition. A reducer is a (possibly randomized) function that takes as input a binary string `k` which is the key, and a sequence of values `v_1, v_2, ...` which are also binary strings. As output, the reducer produces a multiset of pairs of binary strings `langle k; v_(k,1)rangle , langle k; v_(k,2)rangle, langle k; v_(k,3)rangle, ...` The key in the output tuples is identical to the key in the input tuple.

So we allow mappers to manipulate keys arbitrarily, but reducers cannot change the keys at all.

A Map Reduce Program

Assumptions of the Model

The Map Reduce Class.

Given a program input, a sequence of pairs `(k[j], v[j])`, for `j=1,2,3,...` where `k[j]` and `v[j]` are binary strings, we define the length of this input to be `n = sum_j(|k[j]| + |v[j]|)`, where `|a|` denotes the length of the binary string `a`.

Definition. Fix an `epsilon > 0`. An algorithm in `MRC^k` consists of a sequence `langle m[1], r[1], m[2], r[2], ..., m[R], r[R] rangle` of operations which outputs the correct answer with probability at least `3/4` where:

We define `MRC = cup_k MRC^k` and we define `DMRC` in an analogous fashion where we require our machines and the above operations to be deterministic.

One key thing to note is we allow the mappers and reducers to run in time polynomial in `n` not polynomial in the length of the input they receive.

DMRC is in P

Theorem. Languages in DMRC can be decided by RAMs running in polynomial time and using at most `O(n^2 log n)` space.

Proof. The idea is that we just want to compute all of the map reduce steps on a single machine. Note each mapper or reducer from the definition runs in at most polynomial time and uses at most sublinear in `n` space. We require the space used by these machines in round `i` to be sub-linear in the original input, not in the output of round `i-1`. In a given round the total output is sub-quadratically many key value pairs. Let `p(n)` be a polynomial bounding the run time of any mapper or reducer. So we could run each mapper on a single machine in a serial fashion, get their total outputs and use those to run each reducer serially on these outputs to generate the input for the next round. To simulate a single round would take time `O(n^2 cdot p(n))`, simulating all rounds would take time `O(log^k n cdot n^2 cdot p(n))`. We only need to keep the previous rounds output in memory at an given time, so we get the space bound.

Connections between NC and DMRC

The paper shows the following result using a padding argument on a version of the CIRCUIT VALUE PROBLEM. We skip the proof but state the result:

Theorem. If `P ne NC` then `DMRC` is not contained in `NC`.