CS267
Chris Pollett
Apr 5, 2021
Which of the following is true?
We can overcome the two limitations of our first algorithm for ranked retrieval by using two heaps: one to manage the query terms and, for each term t, keep track of the next document that contains t; the other one to maintain the set of the top `k` search results seen so far:
rankBM25_DocumentAtATime_WithHeaps((t[1], .. t[n]), k) { // create a min-heap for top k results for(i =1 to k) { results[i].score := 0; } // create a min-heap for top k results for (i = 1 to n) { terms[i].term := t[i]; terms[i].nextDoc = nextDoc(t[i], -infty); } sort terms in increasing order of nextDoc x while (terms[1].nextDoc < infty) { d := terms[1].nextDoc; score := 0; while(terms[1].nextDoc == d) { t := terms[1].term; score += log(N/N_t)*TM_(BM25)(t,d); terms[1].nextDoc := nextDoc(t,d); REHEAP(terms); // restore heap property for terms; } if(score > results[1].score) { results[1].docid := d; results[1].score := score; REHEAP(results); // restore the heap property for results } } remove from results all items with score = 0; sort results in decreasing order of score; return results; }
The complexity of this algorithm is `Theta(N_q cdot log(n) + N_q \cdot log(k))`.