Chris Pollett > Students >
Kukreti

    ( Print View )

    [Bio]

    [CS297/298 Blog]

    [CS297 Proposal]

    [Reading-The Internet Archive-PDF.]

    [Reading-Chapter 15 and 16 [Buttcher 10]-PDF.]

    [Reading-Chapter 13 [Buttcher 10]-PDF.]

    [Reading-Entity Tags-PDF.]

    [Deliverable 1]

    [Deliverable 2]

    [Deliverable 3]

    [Deliverable 4]

    [CS297 Report-PDF.]

    [CS298 Proposal]

    [CS298 Report-PDF]

    [CS298 Presentation-PDF]

    [CS298 Code]

    [Graduation Pic]

                          

























CS297 Proposal

Yioop! Full Historical Indexing in Cache Navigation

Akshat Kukreti (akshat.kukreti@students.sjsu.edu)

Advisor: Dr. Chris Pollett

Description:

Yioop! displays a link to cached versions of web pages when showing the results of a query. The links within these cached pages redirect the user to pages that are live at that time. The first goal of my project is to add a feature to Yioop! that enables following links to cached versions of web pages instead of live ones depending upon the time when the parent web page was crawled. This feature will be similar to The Internet Archive WayBack Machine. The user will also be able to do a text search in the cached version. Yioop! is currently optimized to search single indexes at a time. The second goal of the project is to allow fast searches in multiple indexes. The third goal of the project is to modify the fetchers for handling Etags.

Schedule:

Week 1: Sep.4-11Read [Kahle 96], [Rackley 09]
Week 2: Sep.12-18Understand the JavaScript used in the Internet Archive
Week 3: Sep.19-25Deliverable 1: Code a script similar to the one read in the previous week and test it.
Week 4: Sep.26-Oct.2Read chapter 15 [Buttcher 10] and understand how Yioop! searches across an index
Week 5: Oct.3-9Understand the Yioop! index dictionary
Week 6: Oct.10-16Deliverable 2: Modify the Yioop! index dictionary and test search across multiple indexes
Week 7: Oct.17-23Read chapter 13 [Buttcher 10]
Week 8: Oct.24-30Understand Yioop!'s caching mechanism
Week 9: Oct.31-Nov.6Deliverable 3: Alter links and test modified links for redirection to cached results
Week 10: Nov.7-13Read about Etags
Week 11: Nov.14-20Read and understand Yioop! fetcher code
Week 12: Nov.21-27Deliverable 4: Test modified Yioop! fetcher for handling of Etags
Week 13-14: Nov.28-Dec.11Work on final report
Week 15: Dec.12-18Deliverable 5: CS297 report

Deliverables:

The full project will be done when CS298 is completed. The following will be done by the end of CS297:

1. Code a script similar to the JavaScript used in the Internet archive and test it.

2. Understand the indexing in Yioop! and make a non-trivial modification to the index dictionary in Yioop! to enable searching across multiple indexes.

3. Alter links so that they go to cached results within a single index.

4. Modify the Yioop! fetcher and test for handling Etags

5. CS297 final report containing a summary of what was done during the semester, and future work.

References:

[Buttcher 10] Information Retrieval: Implementing and Evaluating Search Engines. Stefan Buttcher, Charles L.A. Clarke, Gordon V. Cormack. The MIT Press. 2010.

[Kahle 96] Archiving the Internet. Brewster Kahle. Internet Archive. 1996.

[Rackley 09] Internet Archive. Marilyn Rackley. Library, Harvard University, Cambridge, Massacheussets, U.S.A. 2009.