Language Modeling, Test Collections, Open-Source IR Systems, Inverted Indexes




CS267

Chris Pollett

Aug 28, 2023

Outline

Introduction

Unknown Shakespeare

Higher-order Models

Example

Smoothing

Quiz

Which of the following is true?

  1. Effectiveness of an IR system is measured in terms of time (seconds per query) and space (bytes per document).
  2. The Probability Ranking Principle says if an IR system's response to each query is a ranking of the documents in the collection in order of decreasing probability of relevance, then the overall effectiveness of the system to its users will be maximized.
  3. Zipf's Law says the the `i`th most common word in English occurs with probably `e^{-i}`.

Markov Models

Markov Model for to be or not to be

More Markov Models

Test Collections

TREC Tasks

Open-Source IR Systems

Inverted Indices

diagram with dictionary and posting list of an inverted index

ADT Example: Phrase Search