More Index Compression




CS267

Chris Pollett

Nov. 14, 2011

Outline

Huffman Coding

Huffman Tree Construction

  • Suppose we have a compression model `M` with `M(sigma_i) = Pr(sigma_i)`.
  • To construct a Huffman code, we:
    1. Start with a set of trees `T_i` one for each `sigma_i`. The probability of a tree will be the some of the probabilities of the symbols in it.
    2. We next repeat until there is only one tree left and do:
      Take the two trees of least probability, say `T_j` and `T_k`, and merge these into a single tree `T_i` consisting of a new top node labeled with the sum of `T_j` and `T_k`'s probabilities and having the more probable tree as the left child and the less probable as the right child. The edge to left child is labeled 0; to the right is labeled 1.
  • Once this tree is constructed, codes for symbols correspond to paths down this tree to the symbol in question.
  • Example

    Huffman tree Example

    Facts About Huffman Codes

    Canonical Huffman Codes

    Quiz

    Which of the following is true? (More than one)

    1. A `gamma`-code is a prefix-free code.
    2. A proper binary tree can have an internal node with only one child.
    3. The source coding theorem gives a lower bound on one's ability to compress a string based on the probability distribution of the symbols in it.

    Motivating Arithmetic Coding

    Arithmetic Coding

    More Arithmetic Coding