Chris Pollett > Students >

    ( Print View )


    [CS297/298 Blog]

    [CS297 Proposal]

    [Reading-The Internet Archive-PDF.]

    [Reading-Chapter 15 and 16 [Buttcher 10]-PDF.]

    [Reading-Chapter 13 [Buttcher 10]-PDF.]

    [Reading-Entity Tags-PDF.]

    [Deliverable 1]

    [Deliverable 2]

    [Deliverable 3]

    [Deliverable 4]

    [CS297 Report-PDF.]

    [CS298 Proposal]

    [CS298 Report-PDF]

    [CS298 Presentation-PDF]

    [CS298 Code]

    [Graduation Pic]


Akshat Kukreti CS297-298 Blog

04/23/2013 Eleventh project meeting

  • Modify fetcher and queue_server so that it also accepts "Expires:" header along with ETag header.
  • Continue working on experiments to determine difference in crawl speed and savings in bandwidth.
  • First draft of CS298 report due on 04-30-2013.

04/16/2013 Tenth project meeting

  • Continue working on experiment to determine delays in scheduling due to B-Tree insert and look-up operations on a 100,000 page crawl.
  • Experiment to determine savings in bandwidth using a set of sites that have ETags

04/09/2013 Ninth project meeting

  • Experiment to determine delays in scheduling due to B-Tree insert and look-up operations.
  • Conduct experiment on a 100,000 page crawl.

04/02/2013 Eighth project meeting

  • Fix bugs and integrate B-Tree with Yioop!.

03/19/2013 Seventh project meeting

  • Implement a B-Tree for saving/lookup of ETags.

03/12/2013 Seventh project meeting

  • Test ETag code in multiple queue server setting.
  • Understand how queue server handles robot data. Code and test a similar script for ETags.

03/05/2013 Sixth project meeting

  • Understand fetch controller and how it processes data received from fetchers.

02/26/2013 Fifth project meeting

  • Fix bugs in the code for sending ETags to the responsible fetcher.

02/19/2013 Fourth project meeting

  • Move code for extracting ETags to fetch_url
  • Modify fetcher so that it sends the extracted ETags to the responsible queue server.
  • Experiment to approximate a lower bound on time spent on multiple lookups on a large file.

02/12/2013 Third project meeting

  • Modify fetcher so that it sends the extracted ETags to the responsible queue server.
  • Explore and experiment with data structures for storing ETags with focus on speed of adding, update and look-up of ETags.

02/05/2013 Second project meeting

  • Upload CS298 proposal.
  • Modify code in fetch_url for extracting ETags from web pages.
  • Modify fetcher so that it detects ETags in the schedule.

01/29/2013 First project meeting

  • Discussed project in detail with the advisor.


12/04/2012 Twelfth project meeting

  • Fix the bug in cache request code so that it runs in multiple machines.
  • Localize User Interface for cached pages.
  • Complete CS297 report.

11/27/2012 Eleventh project meeting

  • Add all deliverables to the web page.
  • Draft CS297 report (Due 12/04/2012).

11/20/2012 Tenth project meeting

  • Found usability issues with the appearance of the links to cached pages.
  • Discussed how index look-up can be used to prevent re-downloading content that already exists.
  • Improve appearance of links to cached versions (Grouping by year/month depending on the index).
  • Download a new copy of Yioop! and make a patch with the changes
  • Continue reading code that does URL scheduling.
  • Implement Etag header logic using PHP and Curl.

11/13/2012 Ninth project meeting

  • Make a clean patch for deliverables 1,2, and 3 (Due 11/20/2012).
  • Make more slides on Entity tags (Due 11/20/2012).
  • Understand Yioop!'s crawl scheduling code and make slides (Due 11/20/2012).
  • Turn in proposal on how Etags can be used in Yioop! (Due 11/20/2012).

11/06/2012 Eighth project meeting

  • Include links to cached pages from past timestamps on cached results.
  • Make links to other cached pages better in appearance.
  • Read about entity tags.
  • Figure out how Google optimizes caching.

10/30/2012 Seventh project meeting

  • Modify code to display links to other cached pages to display links in a proper way.
  • Improve code for canonicalization of links
  • Improve code for summary and offset

10/23/2012 Sixth project meeting

  • Display links to all other cached pages when displaying the cached version of a page (Due 10/30/2012).
  • Speed-optimize parallel model

10/16/2012 Fifth project meeting

  • Demoed the changed cache request code
  • Modify the code that converts relative links into absolute links so that the links redirect to cached versions of web pages and if no cached versions are available, the links should redirect to live pages(Due 10/23/2012).

10/09/2012 Fourth Project meeting

  • Discussed the changed cache request and output code. Found an issue with the cached pages returned.
  • Try and fix the issue in the cache request and output code.
  • Read and understand Yioop!'s index dictionary, index shard, and file cache code(Due 10/16/2012).

10/02/2012 Third Project meeting

  • Read chapter 13 of [Buttcher 10].
  • Finish making changes to the cache request and output code(Due 10/09/2012).
  • Discuss how to speed-optimize the changed code(10-09-2012).

09/18/2012 Second Project meeting

  • Discussed Yioop's cache request and output code.
  • Read chapter 15 of [Buttcher 10].
  • Read chapter 16 of [Buttcher 10].
  • Make changes to the cache request code so that if a page wasn't cached at a given timestamp, it searches for the first timestamp greater than the given timestamp, and displays the page. A "nocache" should only result if there is no timestamp greater than the given timestamp(Due 10-02-2012).
  • Discuss how to speed-optimize the changed code(10-02-2012).

09/11/2012 First Project meeting

  • Discussed the JavaScript used in the WayBack Machine.
  • Summarize readings done on the Internet Archive(Due 09-18-2012)
  • Experiment with the WayBack Machine (For example, availability and appearance of cached pages) and report results(Due 09-18-2012)