Proximity Ranking, Boolean Retrieval, Evaluating Results




CS267

Chris Pollett

Sep. 19, 2011

Outline

Introduction

Proximity Ranking

Algorithm for Finding Covers

nextCover(t[1],.., t[n], position) 
{
    v:= max_(1≤ i ≤ n)(next(t[i], position));
    if(v == infty)
        return [ infty, infty];
    u := min_(1≤ i ≤ n)(prev(t[i], v+1))
    if(docid(u) == docid(v) ) then 
        return [u,v]; 
       // covers need to be in the same document
    else
        return nextCover(t[1],.., t[n], u);
}

Ranking Covers

Ranking Algorithm with Proximity Scores

rankProximity(t[1],.., t[n], k)
// t[] term vector
// k number of results to return 
{
    u := - infty;
    [u,v] := nextCover(t[1],.., t[n], u);
    d := docid(u);
    score := 0;
    j := 0;
    while( u < infty) do
        if(d < docid(u) ) then
        // if docid changes record info about last docid
            j := j + 1;
            Result[j].docid := d;
            Result[j].score := score;
            d := docid(u);
            score := 0;
        score := score + 1/(v - u +1);
        [u, v] := nextCover(t[1],.., t[n], u);
    if(d < infty) then
        // record last score if not recorded
        j := j + 1;
        Result[j].docid := d;
        Result[j].score := score;
    sort Result[1..j] by score;
    return Result[1..k];   
}
  

Using an analysis similar to that used for galloping search in the book, you can prove this algorithm has running time:
`O(n^2 l cdot log(L/l))`.

Quiz

Which of the following is true?

  1. One possible weighting scheme for the components of our query and document vectors when cosine ranking is being used is to use TF-IDF.
  2. Determining whether a phrase occurs within a pair of html tags could not be implemented using our basic inverted index ADT.
  3. Returning all exact phrase matches using a binary search implementation of next and prev is always faster than a sequential implementation.

Boolean Retrieval

Extending Our ADT for Boolean Retrieval

Algorithm to Return the Next Solution to a Positive Boolean Query (No NOT's).

nextSolution(Q, position)
{
    v := docRight(Q, position);
    if v = infty then
        return infty;
    u := docLeft(Q, v+1);
    if(u == v) then
        return u;
    else
        return nextSolution(Q, v);
}

Algorithm to Return All Solutions to a Positive Boolean Query

u :=  -infty
while u < infty do
    u := nextSolution(Q, u);
    if(u < infty) then
        report docid(u);

If we implement nextDoc, prevDoc with galloping search, the complexity of this algorithm is `O(n cdot l cdot log(L/l))`

Handling Queries with NOT

Measuring the effectiveness

Recall and Precision