Language Modeling, KL Divergence




CS267

Chris Pollett

Nov. 28, 2011

Outline

Introduction

Languages Models

Smoothing

Quiz

Which of the following is true?

  1. The I/O complexity of NO MERGE is quadratic in the number of tokens
  2. Hybrid Index Maintenance can have non-linear performance
  3. The log-odds transformation does not preserve rankings.

Ranking with Language Models

Massaging our equations

Substituting in a particular model

Kullback-Leibler Divergence