 
   
   
   
   
   
   
   
   
   
   
CS267
Chris Pollett
Oct. 14, 2019
Which of the following is true?
We can overcome the two limitations of our first algorithm for ranked retrieval by using two heaps: one to manage the query terms and, for each term t, keep track of the next document that contains t; the other one to maintain the set of the top `k` search results seen so far:
rankBM25_DocumentAtATime_WithHeaps((t[1], .. t[n]), k) {
    for(i =1 to k) {
       results[i].score := 0;
    }
    // create a min-heap for top k results
    for (i = 1 to n) { 
        terms[i].term := t[i];
        terms[i].nextDoc = nextDoc(t[i], -infty);
    }
    sort terms in increasing order of nextDoc //establish heap for terms
    while (terms[0].nextDoc < infty) {
        d := terms[0].nextDoc;
        score := 0;
        while(terms[0].nextDoc == d) {
            t := terms[0].term;
            score += log(N/N_t)*TM_(BM25)(t,d);
            terms[0].nextDoc := nextDoc(t,d);
            REHEAP(terms); // restore heap property for terms;
        }
        if(score > results[0].score) {
            results[0].docid := d;
            results[0].score := score;
            REHEAP(results); // restore the heap property for results
        }
    }
    remove from results all items with score = 0;
    sort results in decreasing order of score;
    return results;
}
The complexity of this algorithm is `Theta(N_q cdot log(n) + N_q \cdot log(k))`.