More Parallel Information Retrieval




CS267

Chris Pollett

May 2, 2022

Outline

Introduction

Index Partitioning Schemes

Document Partitioning

How many documents should the servers return?

How many docs to return? (cont'd)

Graph of k versus m

Quiz

Which of the following is true?

  1. Dirichlet smoothing smooths a document using an underlying corpus by imagining we extend the length of the document by words chosen from the underlying corpus.
  2. Inter-query parallelism means splitting the processing of parts of a query across machines.
  3. We estimated self information in the DFR formula using Laplace's rule of succession.

Bottlenecks of Document Partitioning

Query Processing with Term Partitioning

Drawbacks of Term Partitioning/ Hybrid approach

MapReduce

The Basic Framework

Distinct Phases of a MapReduce Job

Example MapReduce Job for Counting

Example map reduce job for counting

Parallelizing Map Reduce

Combiners

Fault Tolerance

Document Quality Measures

Traffic Rank