Posting Lists, Index Construction




CS267

Chris Pollett

Oct. 8, 2012

Outline

Posting Lists

Random Accesses of Posting Lists

Prefix Queries

Interleaving Dictionary and Posting Lists

An example of the dictionary interleaving strategy

Quiz

Which of the following is true?

  1. Using character n-grams makes sense for languages where words are not separated by spaces.
  2. UTF-8 encodes all characters with the same number of bytes.
  3. An index dictionary is only used at query processing time -- not during index construction.

Dropping the distinction between terms and postings

Index Construction

In-memory Index Construction

buildIndex

buildIndex (indexTokenizer)
{
   position := 0;
   while (inputTokenizer.hasNext()) {
      T := inputTokenizer.getNext();
      obtain dictionary entry for T; create new entry, if necessary;
      append new posting position to T's posting list;
      position ++;
   }
   sort all dictionary entries in lex order
   for each term T in the dictionary {
      write T's postings list to disk
   }
   write the dictionary to disk
}
return

Next day we will continue our discussion of index construction.