Huffman and Arithmetic Coding, Posting List Compression




CS267

Chris Pollett

Oct 29, 2018

Outline

Huffman Coding

Huffman Tree Construction

Example

Huffman tree Example

Facts About Huffman Codes

Canonical Huffman Codes

Motivating Arithmetic Coding

Arithmetic Coding

More Arithmetic Coding

Redux

Quiz

Which of the following is true?

  1. Given a list of intervals `S`, the function `G(S)` return a sublist of `S` which is a GC-list.
  2. The trec_eval software is used to replace human relevance judgments.
  3. For any `n` bit number, it takes `3n` bits to write it as a `gamma` code.

Finishing Up General Text Compression

Compressing Posting Lists: `Delta`-values

Nonparametric Gap Compression

Parametric Gap Compression

Geometric Distributions and Posting Lists

Golomb/Rice Codes

Finding the Modulus

Byte-Aligned Codes