Chris Pollett >
Old Classses >
CS255
( Print
View )
Student Corner:
[Grades Sec1]
[Submit Sec1]
[Lecture Notes]
[Discussion Board]
Course Info:
[Texts & Links]
[Topics/Outcomes]
[Outcomes Matrix]
[Grading]
[HW/Quiz Info]
[Exam Info]
[Regrades]
[Honesty]
[Additional Policies]
[Announcements]
HWs and Quizzes:
[Hw1]
[Hw2]
[Hw3]
[Hw4]
[Hw5]
[Quizzes]
Practice Exams:
[Mid]
[Final]
|
HW3 Solutions
Page
For this homework, nobody had completely satisfactory solutions for the written problems, so I took a
stab at writing some up myself. I did choose one of the student coding programs below and the coders
of that will receive 1 bonus point.
[ParallelMIS.java]
- Modify the Async-CPP algorithm so that it works if there were three processors and three registers.
First, we modify the non-synchronous case to work with three processors. The idea is that the three processors
work in rounds. In a given round, processor `i` (amongst 0,1,2) generates two random bits, writes their values into register `i` (which we assume can hold at least a two bit number). Since we are in the synchronous case, all three processors perform this before, the next phase of the round. In the second phase of a round, processor `i` checks the values of registers `i-1 %3` and `i+1 % 3`. If the value of its random bits is less than either it writes 0 into register `i` and halts. It its value is larger than both, it writes `#` in its register and halt. Otherwise, it continues into next round. In a given round, the odds that all three registers are distinct is at least `3/4 cdot 1/2 = 3/8`, with high probability the number of round before everyone halts is small.
To handle the asynchronous case we add timestamps. If processor `i`'s clock is behind the timestamp it reads from a register, then it changes the bits it picked to the values of the register, if processor `i` is ahead of the timestamp it reads from a register, it writes 0 to that register; otherwise, i:
0. P[j] is initially scanning register `i=j`. The variable `i` though is used to
keep track of the register it is current;y scanning.
Thereafter, it changes its current register at the end of each iteration.
The local variables T[j] (its clock) and B[j] (its random bits) are initialized to 0, 0 respectively.
B[j] can hold at least two bits, so could be one of 0, 1, 2, 3.
There are three global registers each of whose values are ordered 4-tuples and these are
initialized to (0, 0, 0, 0). The 4-tuple represents a timestamp for each
of the processors and a bits value.
1. P[j] obtains a lock on the current register and reads (t[0], t[1], t[2], R[i]).
2. P[j] selects one of cases:
(a) case [R[i] = #]: halt;
(b) case [T[j] < t[m]] for some m: Pick the biggest t[m], set T[j] = t[m] and B[j] = R[i]
(c) case [T[j] > t[i]] for all i != j: write B[j] into R[i], write t[j] = T[j]
(d) case [T[j] = t[i] for all i]
(i) If B[j] < R[i] halt.
(ii) If B[j] = R[i] Then if i=j and T[j] !=0 write # halt
(iii) If B[j] = R[i] Then if i!=j or T[j] = 0, set T[j] = T[j] + 1 and t[j] = t[j] + 1,
assign two random bit-values to B[j], assign R[i] = B[j]
(iv) If B[j] > R[i] set and write R[i] = B[j].
3. P[j] releases the lock on its current register,
moves to the register i+1%3, and returns to Step 1.
Suppose T[j] = T[i] for all `i`. This could happen if it
is time 0. In which case (d)(iii) will apply. Suppose it is not round 0, only case (c) and case (d)(iii)
write the t[j] component of a tuple. So processor j must have visited this register once previously
where one of these two cases applied. To come back to the register again it must have visited the
two other registers. (a) never applied in visiting the other registers, or j would have halted.
(b) and(c) didn't apply or `j`'s timestamp would be different. Therefore, it visited the two other registers
and in each case some subcase of (d) applied. Since it hasn't halted yet case (i) could not have applied on the
visits to the other registers, so B[j] is at least as big as the values in the other registers. If case (iii)
didn't apply for those other registers, then B[j] is bigger than the other two registers so if i=j it is
safe to write # and halt.
- Modify the ByzGen algorithm so as to make it work for as large a value `t` of fault processors possible. Determine
`t` in terms of `n`.
The original paper by Pease, Shostak, and Lamport on the problem shows that `t` can't equal `n/3` by given a counter-example in the `n=3` case. If
`t` is at most `|__ n/3__| - 1` then we can use the following algorithm:
Input: A value for b[i], our current decision choice.
Output: A decision d[i].
1. vote = b[i].
2 For each round, do
3. Broadcast vote;
4. Receive votes from all the other processors.
5. Set tally[0] = the number of 0 votes
6. Set tally[1] = the number of 1 votes.
7. If coin = heads then
8 If tally[1] > floor(n/3), set vote=1
9. Else vote=0
10. Else
11. If tally[0] < ceil(2n/3) vote 0, set vote=0
12. Else vote=1
13. If(tally[0] >= ceil(2n/3)) set d[i] = 0 halt
14. If(tally[1] >= ceil(2n/3)) set d[i] = 1 halt
Lines 13 and 14 guarantee that if initially a super-majority of the processors agreed on a value it would then be chosen
as the decision.
Line 7-8 correspond to an L threshold and 10-12 to a H threshold. A faulty processor could potentially foil the L threshold
if there were fewer than good processor floor(n/3) votes of 1 and it voted 1. In which case its vote wouldn't foil the H threshold. By the same reasoning if we could foil H we wouldn't foil L. So the expected number of round before
we get a round a non-foiled threshold is two. Then after at most one more round one of lines 13 or 14 would apply. One
quibble here is that we might want that if at the outset a majority voted one way (rather than a super-majority = 66%) we
could get an outcome different than the original majority after running the above. If you want this case, then you
have to reduce `t` to at most floor(n/6) - 1 and change line 8 check to tally[1] > floor(n/2 + n/6), change the
line 11 to tally[1] > floor(n/2 + 2*n/6) and the checks for lines 13 and 14 to >= ceil(n/2 + 2*n/6), but the same
kind of reasoning applies.
- Carefully give a DMRC algorithm for computing `A^2` where `A` is an `n times n` matrix. Assume the input consist
of (key; value) pairs of the form `((i,j); a_(i,j))` where the key is the pair `(i,j)` and the value is the `(i,j)` entry in `A`.
Let `P = (mu_1, rho_1)` where the mapper, `mu_1` is:
On input ((i,j); a_(i,j)):
for m = 1 to n
output ((i,m); a_(i,j))
output ((m,j); a_(i,j))
and the reducer, `pho_1` is:
On Input ((i,j); a_(i, 1), a_(i, 2), ... ,a_(i, n), a_(1,j), ... a_(n,j))
sum = 0
for m = 1 to n
scan vector to find a_(i,m) and a_(m,j)
sum += a_(i,m) * a_(m,j)
output ((i,j); sum)
Since sum = `sum_(m=1)^n a_(i,m)cdot a_(m,j)` the above algorithm is computing the desired product.
We meet the DMRC criteria of being deterministic and at most polylogarithmic round as we use only
one round in the above and we don't use any randomness. Let `N` be the instance size of
a particular matrix multiple problem sent to our program. The mapper and reducer runtime is bounded
by their for loops which are linear in the dimension size of the matrix so are `O(N^(1/2))`.
The mapper on a given `((i,j); a_(i,j))` input, outputs `O(N^(1/2))` tuples. So the space used by
the mapper meets the DRMC criteria of being `O(N^(1-epsilon))`.
There are at most `O(N) = O(n^2)` input tuples. So the
total space requirements of the output of the mapper will be `O(N^(3/2))` which is within
the `O(N^(2-2epsilon))` required by DMRC. Given an input to the reducer of size `O(N^(1/2))`,
the reducer uses only a constant amount of additional memory to compute sum and
the total output size of the reducer's tuples is `O(N)`. Hence, all the criteria for being
DMRC are met.
- Let `k` be the size of our cache. Let `R` be any request sequence. Prove `F_(FIFO)(R) leq k cdot F_(MIN)(R)+k`.
The proof of this is similar to the LRU case handled in class, the part in italics below
is specific to this the FIFO case...
After the first access, FIFO and MIN always have at least
the page just accessed in common. Consider a subsequence `T` of `R` not including the
first access and during which FIFO faults `k` times. Let `p` be the page accessed
just before `T`. If FIFO faults on the same page twice during `T`,
then `T` must contain accesses to at least `k + 1` different pages. This is because the
page after having been ejected must have come back into the cache. As it is a FIFO
cache of size `k`, all the `k-1` items ahead it must be ejected before it will be evicted
again. This is also true if FIFO faults on `p` during `T`. If neither of these cases
occurs, then LRU faults on at least `k` different pages,
none of them `p`, during `T`. In any case, MIN must fault at least once during `T`.
Partition `R` into `R_0, R_1, ...` such that
`R_0` contains the first access and at most `k` faults by FIFO, and
`R_i` for `i=1,..., k` contains exactly `k` faults by FIFO. On each of
the `R_i`'s where `i >=1`, the ratio of LRU faults to MIN faults is at most `k` to `1`.
During `R_0`, if FIFO faults `k` times, MIN faults at least once.
This gives the theorem.
Return to homework
page.
|