Dynamic Inverted Indexes




CS267

Chris Pollett

Nov. 21, 2011

Outline

Dynamic Inverted Indexes

Batch Updates

REBUILD versus REMERGE

Quiz

Which of the following is true?

  1. One step in Huffman coding is to calculate the probability of a sequence as a subinterval of [0,1).
  2. `omega`-codes are an example of a parametric code.
  3. Simple-9 is an example of a word-aligned code.

Incremental Index Updates

In-memory Hash Index

NO MERGE Index Updates

  • Suppose that, while the search engine is building an index, say after creating `n` on-disk index partitions, we want it to process a keyword query composed of `m` query terms.
  • We could repeat the following procedure for each of the query terms:
    1. Fetch the terms postings list fragment from each of the `n` on disk index partitions
    2. Use the in-memory hash table to fetch the term's in-memory list fragment.
    3. Concatenate all n+1 fragments to form the terms postings list.
  • This strategy is called the NO MERGE index update strategy.
  • It tends not to be a very attractive strategy, due to the large number of disk seeks required to process a search query. (one for every query term and index partition).
  • It is often used as a baseline to which other strategies are compared.
  • Contiguous Inverted Lists

    REMERGE UPDATE

    In-place Index Updates