Chris Pollett > Students >
Amith

    ( Print View )

    [Bio]

    [Project Blog]

    [CS297 Proposal]

    [Del 1: OPIC Algorithm implementation]

    [Del 2: SALSA Algorithm & Nutch]

    [Del 3: Nutch implementation]

    [Del 4: HITS Algorithm implementation]

    [CS297 Report - PDF]

    [CS298 Proposal]

    [CS298 Final Report - PDF]

    [CS298 Project Source Code - ZIP]

                          

























Deliverable 3

The deliverable 3 provides implementation details related to Nutch search engine. Nutch is an open source search engine. In this deliverable, Nutch was implemented on a Windows Vista 32-bit machine. This implementation helped to understand the various modules that exist in a search engine. This understanding will be useful in implementing the final solution in CS 298.

The Nutch search engine consists of three components:
  1. The Crawler, which discovers and retrieves web pages.
  2. The 'WebDB', a custom database that stores known URLs and fetched page contents.
  3. The 'Indexer', which dissects pages and builds keyword-based indexes from them.

Please refer to the below slides for further information related to Nutch implementation

    [Implementation details of NUTCH search engine - PDF]