Char-gramming, Language Processing, Static Inverted Indices




CS267

Chris Pollett

Oct 8, 2018

Outline

Finishing up Stemming

Example of stemmed text

Stopping

Characters

Understanding Unicode

Character n-grams

European Languages

CJK(V) Languages

Quiz

Which of the following is true?

  1. To use a namespace in PHP we use the using keyword.
  2. The `F_1`-measure from class is the harmonic mean of the recall and precision score.
  3. To define cosine ranking we made use of the notion of a cover for a query.

Inverted Index Intro

The Dictionary

Dictionary Types

Storing Dictionary Terms

Dictionary As a String Example

Sort-based versus Hash-based dictionaries

Posting Lists