Deliverable 3
The deliverable 3 provides implementation details related to Nutch search engine. Nutch is an open source search engine. In this deliverable, Nutch was implemented on a Windows Vista 32-bit machine. This implementation helped to understand the various modules that exist in a search engine. This understanding will be useful in implementing the final solution in CS 298.
The Nutch search engine consists of three components:
- The Crawler, which discovers and retrieves web pages.
- The 'WebDB', a custom database that stores known URLs and fetched page contents.
- The 'Indexer', which dissects pages and builds keyword-based indexes from them.
Please refer to the below slides for further information related to Nutch implementation
[Implementation details of NUTCH search engine - PDF]
|