Language Modeling, Test Collections, Open-Source IR Systems, Inverted Indexes




CS267

Chris Pollett

Aug. 29, 2011

Outline

Higher-order Models

Let's recall the notion of higher-order language model we introduced on Monday ...

Example

Smoothing

Markov Models

Markov Model for to be or not to be

More Markov Models

HW1 -- Exercise 1.4

Markov Model for to be or not to be

Exercise 1.4 Starting in an unknown state, the Markov model above generates "to be". What state or states could be the current state of the model after generating this text?

Answer. In the above diagram, only the states 1 and 4 have a non-zero probability transition on the word "to". For both of these states, there is exactly one non-zero probability transition on this word, and it goes to state 2. From state 2, there is exactly one non-zero probability transition on the word "be" and it is to state 3. Therefore, starting in an unknown state, if the Markov model generates "to be", it must be in state 3.

Test Collections

TREC Tasks

Open-Source IR Systems

Inverted Indices

diagram with dictionary and posting list of an inverted index

ADT Example: Phrase Search