Learning to Crawl




CS267

Chris Pollett

Feb 8, 2021

Outline

Yioop

Yioop Prerequisites if Running Under an Existing Web Server

Configuration

Logging in

Activities

Scripts for Crawling

Crawling Nice

Performing a Crawl

Seeing Search Results

Quiz

Which of the following is true?

  1. PDF documents are collections of compressed XML files.
  2. Smoothing a first order language model by a zero'th order model allows us to gracefully handle plausible phrases that do not occur in a corpus, but might have occurred if the corpus were larger.
  3. nextPhrase is a primitive method of our Inverted Index ADT.

Getting Help Within The App

Media Updater

Controlling Jobs of the Media Updater

Search Sources