How do Search Engines Work?




CS174

Chris Pollett

Dec. 1, 2010

Outline

Introduction

What search engines do not do...

Steps a search engine typically does

  1. Downloads as much of the web as it can, before ever serving the results of any search query.
  2. Extracts words from the text on the pages it has downloaded.
  3. Creates a big index associating each word found with a list of documents containing that word.

At this point, the search engine is ready to handle queries. To handle a query it might:

  1. Look up each word in the query in the word-document index
  2. Intersect the list of documents found for each word to produce a list of documents each of which has all of the words.
  3. Group related documents.
  4. Try to order the documents by how relevant they seem to be to the query.

A Diagram of Search Engine Parts

The fetcher, indexer, and web components of a search engine

Downloading the Web

Maintaining what to crawl next.

How to decide what to crawl next.

Making sure we don't crawl what we're not supposed to

How to keep track of what we've already seen.

Preprocessing and Indexing

Example Index Structure Components

Serving Results

Conclusion