Accumulator Pruning, Concordance Lists




CS267

Chris Pollett

Oct. 16, 2019

Outline

Introduction

Term-at-a-Time Algorithm

rankBM25_TermAtATime((t[1], t[2], ..., t[n]), k) {
    sort(t) in increasing order of N[t[i]];
    acc := {}, acc' := {}; //initialize accumulators.
      //acc used for previous round, acc' for next
    acc[0].docid := infty // end-of-list marker
    for i := 1 to n do {
        inPos := 0; //current pos in acc
        outPos := 0; // current position in acc'
        foreach document d in t[i]'s posting list do {
            while acc[inPos].docid < d do {
                acc'[outPos++] := acc[inPos++]; 
                //copy previous round to current for docs not containing t[i]
            }
            acc'[outPos].docId := d;
            acc'[outPos].score := log(N/N[t[i]]) * TFBM25(t[i], d);
            if(acc[inPos].docid == d) {
                acc'[outPos].score += acc[inPos].score; 
            }
            outPos++;
        }
        while acc[inPos] < infty do { // copy remaining acc to acc'
            acc'[outPos++] := acc[inPos++];
        }
        acc'[outPos].docid :=infty; //end-of-list-marker
        swap acc and acc'
    }
    return the top k items of acc; //select using heap
}

Accumulator Pruning

More Accumulator Pruning

Assigning Accumulators

Precomputing Score Contributions

Light-Weight Structures

Concordance Lists

In-Class Exercise

Given a list of ordered pairs, `S`, suggest pseudo-code to compute `G(S)`. What is the runtime of your code?

Post your solutions to the Oct 16 In-Class Exercise.

Properties of GC-lists

Operators

Examples

Implementation

More Implementation

  • The book defines four operations `tau(S, k)`, `rho(S,k)` and `tau'(S, k)`, `rho'(S,k)`.
  • `tau(S, k)` returns the first interval in the GC-list starting at or after the position `k`; `tau'(S, k)` returns the last interval in `S` ending at or before `k`.
  • `rho(S,k)` returns the first interval in `S` ending at or after the position `k`; `rho'(S,k)` returns the last interval in `S` starting at or before the position `k`.
  • Using either `tau` or `rho` one could enumerate forward through a GC-list starting at a given position. Similarly, using `tau'` or `rho'` one could enumerate backwards through a GC-list from a position.
  • For the GC-list of a term, we can define `\tau` as (the book also shows `tau'` and `rho`, `rho'` for terms are similarly computed):
    tau(t, k) :=
        if (k == infty) {
            u :=infty;
        } else if (k == -infty) {
            u := -infty;
        } else {
            u := next(t, k-1);
        }
        return [u, u]
    
  • Using `tau`, `rho`, `tau'` and `rho'` the book shows how to define each of our binary operators.
  • The books definition of the four binary operators