Chris Pollett > Old Classses >
CS255

( Print View )

Student Corner:
  [Grades Sec1]

  [Submit Sec1]

  [
Lecture Notes]

  [Discussion Board]

Course Info:
  [Texts & Links]
  [Topics/Outcomes]
  [Outcomes Matrix]
  [Grading]
  [HW/Quiz Info]
  [Exam Info]
  [Regrades]
  [Honesty]
  [Additional Policies]
  [Announcements]

HWs and Quizzes:
  [Hw1]  [Hw2]  [Hw3]
  [Hw4]  [Hw5]  [Quizzes]

Practice Exams:
  [Mid]  [Final]

                           












HW3 Solutions Page

For this homework, nobody had completely satisfactory solutions for the written problems, so I took a stab at writing some up myself. I did choose one of the student coding programs below and the coders of that will receive 1 bonus point.

[ParallelMIS.java]

  1. Modify the Async-CPP algorithm so that it works if there were three processors and three registers.
    First, we modify the non-synchronous case to work with three processors. The idea is that the three processors work in rounds. In a given round, processor `i` (amongst 0,1,2) generates two random bits, writes their values into register `i` (which we assume can hold at least a two bit number). Since we are in the synchronous case, all three processors perform this before, the next phase of the round. In the second phase of a round, processor `i` checks the values of registers `i-1 %3` and `i+1 % 3`. If the value of its random bits is less than either it writes 0 into register `i` and halts. It its value is larger than both, it writes `#` in its register and halt. Otherwise, it continues into next round. In a given round, the odds that all three registers are distinct is at least `3/4 cdot 1/2 = 3/8`, with high probability the number of round before everyone halts is small.
    To handle the asynchronous case we add timestamps. If processor `i`'s clock is behind the timestamp it reads from a register, then it changes the bits it picked to the values of the register, if processor `i` is ahead of the timestamp it reads from a register, it writes 0 to that register; otherwise, i:
    0. P[j] is initially scanning register `i=j`. The variable `i` though is used to
       keep track of the register it is current;y scanning. 
       Thereafter, it changes its current register at the end of each iteration. 
       The local variables T[j] (its clock) and B[j] (its random bits) are initialized to 0, 0 respectively. 
       B[j] can hold at least two bits, so could be one of 0, 1, 2, 3.
       There are three global registers each of whose values are ordered 4-tuples and these are
       initialized to (0, 0, 0, 0). The 4-tuple represents a timestamp for each
       of the processors and a bits value.
    1. P[j] obtains a lock on the current register and reads (t[0], t[1], t[2], R[i]).
    2. P[j] selects one of cases:
       (a) case [R[i] = #]: halt;
       (b) case [T[j] < t[m]] for some m: Pick the biggest t[m], set T[j] = t[m] and B[j] = R[i]
       (c) case [T[j] > t[i]] for all i != j: write B[j] into R[i], write t[j] = T[j]
       (d) case [T[j] = t[i] for all i] 
          (i) If B[j] < R[i] halt. 
          (ii) If B[j] = R[i] Then if i=j and T[j] !=0 write # halt
          (iii) If B[j] = R[i] Then if i!=j or T[j] = 0, set T[j] = T[j] + 1 and t[j] = t[j] + 1, 
          assign two random bit-values to B[j], assign R[i] = B[j]
          (iv) If B[j] > R[i] set and write R[i] = B[j].
    3. P[j] releases the lock on its current register, 
       moves to the register i+1%3, and returns to Step 1.
    
    Suppose T[j] = T[i] for all `i`. This could happen if it is time 0. In which case (d)(iii) will apply. Suppose it is not round 0, only case (c) and case (d)(iii) write the t[j] component of a tuple. So processor j must have visited this register once previously where one of these two cases applied. To come back to the register again it must have visited the two other registers. (a) never applied in visiting the other registers, or j would have halted. (b) and(c) didn't apply or `j`'s timestamp would be different. Therefore, it visited the two other registers and in each case some subcase of (d) applied. Since it hasn't halted yet case (i) could not have applied on the visits to the other registers, so B[j] is at least as big as the values in the other registers. If case (iii) didn't apply for those other registers, then B[j] is bigger than the other two registers so if i=j it is safe to write # and halt.
  2. Modify the ByzGen algorithm so as to make it work for as large a value `t` of fault processors possible. Determine `t` in terms of `n`.
    The original paper by Pease, Shostak, and Lamport on the problem shows that `t` can't equal `n/3` by given a counter-example in the `n=3` case. If `t` is at most `|__ n/3__| - 1` then we can use the following algorithm:
    Input: A value for b[i], our current decision choice. 
    Output: A decision d[i].
    1. vote = b[i].
    2  For each round, do
    3.    Broadcast vote;
    4.    Receive votes from all the other processors.
    5.    Set tally[0] = the number of 0 votes
    6.    Set tally[1] = the number of 1 votes.
    7.    If coin = heads then 
    8         If tally[1] > floor(n/3), set vote=1
    9.        Else vote=0
    10.   Else  
    11.       If tally[0] < ceil(2n/3) vote 0, set vote=0
    12.       Else vote=1
    13.   If(tally[0] >= ceil(2n/3)) set d[i] = 0 halt
    14.   If(tally[1] >= ceil(2n/3)) set d[i] = 1 halt
    
    Lines 13 and 14 guarantee that if initially a super-majority of the processors agreed on a value it would then be chosen as the decision. Line 7-8 correspond to an L threshold and 10-12 to a H threshold. A faulty processor could potentially foil the L threshold if there were fewer than good processor floor(n/3) votes of 1 and it voted 1. In which case its vote wouldn't foil the H threshold. By the same reasoning if we could foil H we wouldn't foil L. So the expected number of round before we get a round a non-foiled threshold is two. Then after at most one more round one of lines 13 or 14 would apply. One quibble here is that we might want that if at the outset a majority voted one way (rather than a super-majority = 66%) we could get an outcome different than the original majority after running the above. If you want this case, then you have to reduce `t` to at most floor(n/6) - 1 and change line 8 check to tally[1] > floor(n/2 + n/6), change the line 11 to tally[1] > floor(n/2 + 2*n/6) and the checks for lines 13 and 14 to >= ceil(n/2 + 2*n/6), but the same kind of reasoning applies.
  3. Carefully give a DMRC algorithm for computing `A^2` where `A` is an `n times n` matrix. Assume the input consist of (key; value) pairs of the form `((i,j); a_(i,j))` where the key is the pair `(i,j)` and the value is the `(i,j)` entry in `A`.
    Let `P = (mu_1, rho_1)` where the mapper, `mu_1` is:
    On input ((i,j); a_(i,j)):
    for m = 1 to n
        output ((i,m); a_(i,j))
        output ((m,j); a_(i,j))
    
    and the reducer, `pho_1` is:
    On Input ((i,j); a_(i, 1), a_(i, 2), ... ,a_(i, n), a_(1,j), ... a_(n,j))
    sum = 0
    for m = 1 to n
       scan vector to find a_(i,m) and a_(m,j)
       sum += a_(i,m) * a_(m,j)
    output ((i,j); sum)
    
    Since sum = `sum_(m=1)^n a_(i,m)cdot a_(m,j)` the above algorithm is computing the desired product. We meet the DMRC criteria of being deterministic and at most polylogarithmic round as we use only one round in the above and we don't use any randomness. Let `N` be the instance size of a particular matrix multiple problem sent to our program. The mapper and reducer runtime is bounded by their for loops which are linear in the dimension size of the matrix so are `O(N^(1/2))`. The mapper on a given `((i,j); a_(i,j))` input, outputs `O(N^(1/2))` tuples. So the space used by the mapper meets the DRMC criteria of being `O(N^(1-epsilon))`. There are at most `O(N) = O(n^2)` input tuples. So the total space requirements of the output of the mapper will be `O(N^(3/2))` which is within the `O(N^(2-2epsilon))` required by DMRC. Given an input to the reducer of size `O(N^(1/2))`, the reducer uses only a constant amount of additional memory to compute sum and the total output size of the reducer's tuples is `O(N)`. Hence, all the criteria for being DMRC are met.
  4. Let `k` be the size of our cache. Let `R` be any request sequence. Prove `F_(FIFO)(R) leq k cdot F_(MIN)(R)+k`.
    The proof of this is similar to the LRU case handled in class, the part in italics below is specific to this the FIFO case...
    After the first access, FIFO and MIN always have at least the page just accessed in common. Consider a subsequence `T` of `R` not including the first access and during which FIFO faults `k` times. Let `p` be the page accessed just before `T`. If FIFO faults on the same page twice during `T`, then `T` must contain accesses to at least `k + 1` different pages. This is because the page after having been ejected must have come back into the cache. As it is a FIFO cache of size `k`, all the `k-1` items ahead it must be ejected before it will be evicted again. This is also true if FIFO faults on `p` during `T`. If neither of these cases occurs, then LRU faults on at least `k` different pages, none of them `p`, during `T`. In any case, MIN must fault at least once during `T`.

    Partition `R` into `R_0, R_1, ...` such that `R_0` contains the first access and at most `k` faults by FIFO, and `R_i` for `i=1,..., k` contains exactly `k` faults by FIFO. On each of the `R_i`'s where `i >=1`, the ratio of LRU faults to MIN faults is at most `k` to `1`. During `R_0`, if FIFO faults `k` times, MIN faults at least once. This gives the theorem.

Return to homework page.