CS267
Chris Pollett
Nov. 14, 2012
Exercise 6.6. What is the expected number of bits per codeword when using a Rice code with parameter `M = 2^7` to compress a geometrically distributed postings list for a term `T` with `N_t/N approx 0.01`?
Answer. `E[mbox(bits/code word)] = sum_(k=1)^(infty) P(mbox(gap is k))cdot (mbox(length of codeword when the gap is k)).`
Since the postings are distributed geometrically with `N_t/N approx 0.01` we have:
`P(mbox(gap is k)) = (1 - 0.01)^(k-1)(0.01) = 0.01(0.99)^(k-1)`.
The length of a codeword when the gap is `k` using a Rice code is `q(k) +1 + lfloor log_2(M) rfloor` where `q(k) = lfloor (k-1)/M rfloor`. Since we are given `M = 2^7`, the length is
`1 + 7 + lfloor (k-1)/(2^7) rfloor = 8 + lfloor (k-1)/(2^7) rfloor`.
So `E[mbox(bits/code word)]` is
`sum_(k=1)^(infty) 0.01(0.99)^(k-1) cdot (8 + lfloor (k-1)/(2^7) rfloor)`
`approx 0.08 sum_(k=1)^(infty) (0.99)^(k-1) + 7.81 times 10^(-5) sum _(k=1)^(infty) (k-1)(0.99)^(k-1)`
` approx 0.0799 sum_(k=1)^(infty) (0.99)^(k-1) + 7.81 times 10^(-5) sum _(k=1)^(infty) k(0.99)^(k-1)`
Temporarily, set `r= 0.99`, so we have
` = 7.81 times 10^(-5) d/(dr)(sum _(k= 0)^(infty) r^k) + 0.0799 sum_(k=0)^(infty) r^(k)`
Using that the sum of a geometric series `sum_(k=0)^(infty)r^k` is `1/(1-r)`.
` = 7.81 times 10^(-5) d/(dr)(1/(1 - r)) + 0.0799 1/(1-r)`
Taking the derivative
` = 7.81 times 10^(-5) /(1-r)^2 + 0.0799 1/(1-r)`
Finally, substituting for `r`
`= 0.781 + 7.99 = 8.77` bits.