Evaluating Results, Token and Term Processing




CS267

Chris Pollett

Sep 25, 2023

Outline

Measuring Retrieval Effectiveness

Recall and Precision

F-measures

Measures for the first `k` results

Example Precision Recall Plot from the Book

Average Precision

Precision at k and MAP scores for different ranking schemes

Quiz

Which of the following is true?

  1. The IDF score for a term as presented in class depends on the number of documents that contain the term.
  2. If we are using VSM for ranking and retrieval, we can use a docid index for our inverted index.
  3. The proximity ranking algorithm incorporates the number of documents that contain the query terms directly in the proximity score of a document.

Building a Test Collection

Efficiency Measures

Token and Terms

Punctuation and Capitalization

Stemming

Finishing up Stemming

Example of stemmed text

Stopping