Divergence-from-randomness, Parallel Information Retrieval




CS267

Chris Pollett

Nov 26, 2018

Outline

Introduction

The Basic DFR Formula

More on the Basic DFR Formula

Quiz

Which of the following is true?

  1. Pseudo-Relevance Feedback is when we adjust BM25 to weight different components of a web document such as title differently.
  2. In Jelinek Mercer smoothing, to determine the frequency of a term in a document we imagine increasing the document length by a factor `mu` according to the language model of the underlying corpus.
  3. Query expansion involves picking randomly a doc with the query term, selecting using this doc's model a term from the document, and adding the term to the query. Then at random we choose whether to stop or continue expanding.

The Binomial Coefficient

Estimating `P_1`

Estimating `P_2` and Computing DFR

Document Length Normalization

Parallel Information Retrieval

Parallel Query Processing

Index Partitioning Schemes