Arithmetic Coding, Gap Compression




CS267

Chris Pollett

Nov. 6, 2023

Outline

Motivating Arithmetic Coding

Arithmetic Coding

More Arithmetic Coding

Redux

Finishing Up General Text Compression

Compressing Posting Lists: `Delta`-values

Nonparametric Gap Compression

Quiz

Which of the following is true?

    1. Given a symbol source `S`, emitting symbols from an alphabet `A` according to a probability distribution `P_A`, a sequence of symbols cannot be compressed to consume less than its entropy.
    2. To code a string using a `gamma`-code we first build a Huffman tree.
    3. The trec_eval program makes use of a sub-module to automatically judge relevance of documents without human involvement.

Parametric Gap Compression

Geometric Distributions and Posting Lists

Golomb/Rice Codes

Finding the Modulus

Byte-Aligned Codes