More Parallel Information Retrieval




CS267

Chris Pollett

May 3, 2021

Outline

Introduction

Index Partitioning Schemes

Document Partitioning

How many documents should the servers return?

How many docs to return? (cont'd)

Graph of k versus m

Quiz

Which of the following is true?

  1. Jelinek Mercer smoothing smooths a corpus based on a document using an underlying corpus by imagining we extend the length of the document by words chosen from the underlying corpus.
  2. We estimated eliteness in the DFR formula using Laplace's rule of succession.
  3. Inter-query parallelism means splitting the processing of parts of a query across machines.

Bottlenecks of Document Partitioning

Query Processing with Term Partitioning

Drawbacks of Term Partitioning/ Hybrid approach

MapReduce

The Basic Framework

Distinct Phases of a MapReduce Job

Example MapReduce Job for Counting

Example map reduce job for counting

Parallelizing Map Reduce

Combiners

Fault Tolerance