Accumulator Pruning, Concordance Lists




CS267

Chris Pollett

Oct. 22, 2018

Outline

Introduction

Accumulator Pruning

More Accumulator Pruning

Assigning Accumulators

Precomputing Score Contributions

Quiz

Which of the following is true?

  1. In sort-based index construction, we built in-memory data structures called partitions which we periodically write to disk before merging them.
  2. IDF in the BM25 formula is computed in the same way as when we computed IDF for TF-IDF weights in the vector space model.
  3. Our term-at-a-time query processing algorithm only works for proximity ranking.

Light-Weight Structures

Concordance Lists

Properties of GC-lists

Operators

Examples

Implementation

More Implementation

  • The book defines four operations `tau(S, k)`, `rho(S,k)` and `tau'(S, k)`, `rho'(S,k)`.
  • `tau(S, k)` returns the first interval in the GC-list starting at or after the position `k`; `tau'(S, k)` returns the last interval in `S` ending at or before `k`.
  • `rho(S,k)` returns the first interval in `S` ending at or after the position `k`; `rho'(S,k)` returns the last interval in `S` starting at or before the position `k`.
  • Using either `tau` or `rho` one could enumerate forward through a GC-list starting at a given position. Similarly, using `tau'` or `rho'` one could enumerate backwards through a GC-list from aposition.
  • For the GC-list of a term, we can define `\tau` as (the book also shows `tau'` and `rho`, `rho'` for terms are similarly computed):
    tau(t, k) :=
        if (k == infty) {
            u :=infty;
        } else if (k == -infty) {
            u := -infty;
        } else {
            u := next(t, k-1);
        }
        return [u, u]
    
  • Using `tau`, `rho`, `tau'` and `rho'` the book shows how to define each of our binary operators.
  • The books definition of the four binary operators