Learning to Crawl




CS267

Chris Pollett

Feb 7, 2022

Outline

Yioop

Yioop Prerequisites if Running Under an Existing Web Server

Configuration

Logging in

Activities

Scripts for Crawling

Crawling Nice

Performing a Crawl

Seeing Search Results

Quiz

Which of the following is true?

  1. Zipf's law says that the frequency of the ith most common term, `F_i`, will be proportional to `1/{i^\alpha}` for some constant `\alpha`.
  2. The zero'th order language model considered in class, `M(sigma) = frac{mbox{frequency}(sigma)}{sum_{sigma' in V}mbox{frequency}(sigma')}` cannot be easily extended to allow for calculation the probability of phrases.
  3. If a phrase like "fourth witch" had non-zero probability in a non-smoothed, first order language model, there is a setting of `\gamma` under which it could have zero probability in a smoothed model.