Chris Pollett >
Students > [Bio] [Reading-The Internet Archive-PDF.] [Reading-Chapter 15 and 16 [Buttcher 10]-PDF.] [Reading-Chapter 13 [Buttcher 10]-PDF.] |
CS297 ProposalYioop! Full Historical Indexing in Cache NavigationAkshat Kukreti (akshat.kukreti@students.sjsu.edu) Advisor: Dr. Chris Pollett Description: Yioop! displays a link to cached versions of web pages when showing the results of a query. The links within these cached pages redirect the user to pages that are live at that time. The first goal of my project is to add a feature to Yioop! that enables following links to cached versions of web pages instead of live ones depending upon the time when the parent web page was crawled. This feature will be similar to The Internet Archive WayBack Machine. The user will also be able to do a text search in the cached version. Yioop! is currently optimized to search single indexes at a time. The second goal of the project is to allow fast searches in multiple indexes. The third goal of the project is to modify the fetchers for handling Etags. Schedule:
Deliverables: The full project will be done when CS298 is completed. The following will be done by the end of CS297: 1. Code a script similar to the JavaScript used in the Internet archive and test it. 2. Understand the indexing in Yioop! and make a non-trivial modification to the index dictionary in Yioop! to enable searching across multiple indexes. 3. Alter links so that they go to cached results within a single index. 4. Modify the Yioop! fetcher and test for handling Etags 5. CS297 final report containing a summary of what was done during the semester, and future work. References: [Buttcher 10] Information Retrieval: Implementing and Evaluating Search Engines. Stefan Buttcher, Charles L.A. Clarke, Gordon V. Cormack. The MIT Press. 2010. [Kahle 96] Archiving the Internet. Brewster Kahle. Internet Archive. 1996. [Rackley 09] Internet Archive. Marilyn Rackley. Library, Harvard University, Cambridge, Massacheussets, U.S.A. 2009. |