Implementing the Inverted Index ADT




CS267

Chris Pollett

Sep. 12, 2011

Outline

ADT Example: Phrase Search

Generating all Occurrences.

Implementing our ADT.

More on Implementing next and prev

Quiz

Which of the following is true?

  1. Zipf's Law could be used to estimate the frequency of the `i`th most common term in a corpus.
  2. The `n`th-order language model presented in class typically can't be directly expressed in terms of the `0`th order model we gave.
  3. Nutch cannot be configured to use breadth-first search of a web-site.

Galloping Search

Using Galloping Search to Implement Next and Prev.

function next(t, current)
{
   // P[][] = array of posting list array
   // l[] = array of length of these posting lists
   static c = array(); //last positions for terms 

   if(l[t] == 0 || P[t][l[t]] <= current) then
       return infty;
   if( P[t][1] > current) then
       c[t] := 1;
       return P[t][c[t]];

   if( c[t] > 1 && P[t][c[t] - 1] <= current ) do
      low := c[t] -1;
   else
      low := 1;

   jump := 1;

   high := low + jump;

   while (high < l[t] && P[t][high] <= current) do
      low := high;
      jump := 2*jump;
      high := low + jump;
   if(high > l[t]) then
      high := l[t];
   c[t] = binarySearch(t, low, high, current)
   return P[t][c[t]];
}

The book gives a nice analysis of the runtime returning all exact phrase matches when using this algorithm and shows it to be: `O(n cdot l cdot log (L/l))`

Documents and Other Elements