Stopping, Character n-grams, non-English Languages




CS267

Chris Pollett

Sep. 26, 2011

Outline

Finishing up Stemming

Example of stemmed text

Stopping

Quiz

Which of the following is true?

  1. To describe our Proximity Ranking algorithm we extended our Inverted Index API to include docRight and docLeft.
  2. The worst case runtime of our Boolean Retrieval algorithm was better than the worst case runtime of our Proximity Ranking algorithm.
  3. The calculation of MAP only depends on one precision value.

Characters

Understanding Unicode

Character n-grams

European Languages

CJK(V) Languages