Finish Gap Compression, Dynamic Inverted Indexes




CS267

Chris Pollett

Oct 31, 2018

Outline

Introduction

Finding the Modulus

Byte-Aligned Codes

Dynamic Inverted Indexes

Batch Updates

REBUILD versus REMERGE

In-Class Exercise

Consider the posting list [89, 101, 112, 122, 130, 145]. Suppose the corpus size is 200 documents. (a) Compute the `\Delta`-list for this list. Then encode this list using (b) a `\gamma`-code, (c) Golumb-code using the formula from today to determine the best modulus.

Post your answer to the above to the Oct 31 In-Class Exercise Thread.

Incremental Index Updates

In-memory Hash Index

NO MERGE Index Updates

  • Suppose that, while the search engine is building an index, say after creating `n` on-disk index partitions, we want it to process a keyword query composed of `m` query terms.
  • We could repeat the following procedure for each of the query terms:
    1. Fetch the terms postings list fragment from each of the `n` posting lists on disk index partitions
    2. Use the in-memory hash table to fetch the term's in-memory list fragment.
    3. Concatenate all n+1 fragments to form the terms postings list.
  • This strategy is called the NO MERGE index update strategy.
  • It tends not to be a very attractive strategy, due to the large number of disk seeks required to process a search query. (one for every query term and index partition).
  • It is often used as a baseline to which other strategies are compared.
  • Contiguous Inverted Lists

    REMERGE UPDATE

    In-place Index Updates