Language Modeling, Test Collections, Open-Source IR Systems, Inverted Indexes




CS267

Chris Pollett

Feb. 8, 2016

Outline

Higher-order Models

Let's Wednesday we were talking about language modeling. Let's briefly recall some of the things we learned...

Example

Smoothing

Markov Models

Markov Model for to be or not to be

More Markov Models

Quiz

Which of the following is true?

  1. As we defined it last day, the novelty of a document measures the degree to which its contents are focused on the information need.
  2. Zipf's Law as it relates to token/term frequencies, says that the frequency of the `i`th most common term, `F_i`, will be proportional to `1/i^{alpha}` for some constant `alpha`.
  3. The maximum likelihood estimate for the probability of a phrase can be bigger than 1.

Test Collections

TREC Tasks

Open-Source IR Systems

Inverted Indices

diagram with dictionary and posting list of an inverted index

ADT Example: Phrase Search