CS298 Proposal

Improved User News Feed Customization for an Open Source Search Engine

Timothy Chow (timothy.chow@sjsu.edu)

Advisor: Dr. Chris Pollett

Committee Members: Thomas Austin, Robert Chun

Abstract:

Yioop is an open source search engine project hosted on the site of the same name. It offers several features outside of searching, with one such feature being a newsfeed. The current newsfeed system aggregates articles from a curated list of news sites determined by the owner. However in its current state, the feed is limited in size, being able to utilize around 50 sources. One of the goals for my project will be to increase this amount. I will also be implementing the ability for users to personalize the newsfeed for their own use.

CS297 Results

  • Wrote additional test cases for IndexShard class, which is responsible for storing documents or link to documents.
  • Implemented word tracker in Yioop to keep track and display trending words in recent news results.
  • Created a prototype user interface where users would be able to suggest news sources for Yioop to aggregate from.
  • Outlined a framework for NewsfeedBundle class which would handle storing news results on disk, as opposed to memory.

Proposed Schedule

Week 1: Aug 28 - Sep 3First Meeting
Week 2 - 6: Sep 4 - Oct 1Implement iterator that traverses bundles in reverse order. Useful for news results.
Week 7 - 10: Oct 2 - Oct 22Integrate bundles for storing news rather than storing on a SQL database.
Week 11 - 13: Oct 23 - Nov 5Write test cases for the newly implemented stuff.
Week 14 - 17: Nov 6 - Nov 26Complete project report and slides for review.

Key Deliverables:

  • Software
    • Deliverable 1: A method for iterating ArchiveBundles that traverses from most recent to least.
    • Deliverable 2: Modifying the existing newsfeed functionality of Yioop so that it uses ArchiveBundles to read and write from.
    • Deliverable 3: Testing the ReverseIterator implemented and making sure that it integrates correctly with the rest of Yioop systems, in particular the newsfeed.
  • Report
    • CS 298 Report
    • CS 298 Presentation

Innovations and Challenges

  • Improving on the pre-existing method of news result storage in a way that removes previous space constraint. This is accomplished by utilizing the current bundle system coupled with a ReverseIterator that traverses bundles backwards.
  • Modifying a large portion of the Yioop backend that allows this solution fits in.
  • Ensuring that the implemented ReverseIterator and the way that it is hooked into the current system is fault free and reliable.

References:

[1] Jongdeog Lee, Daniel Xu, Md Tanvir Al Amin, Tarek Abdelzaher; iApollo: A Newsfeed Summary Service on NDN; iEEE, 2017.

[2] Nicola Ferro, Yubin Kim, Mark Sanderson; Using Collection Shards to Study Retrieval Performance Effect Sizes; ACM, 2019.

[3] Bo Long, Yi Change; Relevance Ranking for Vertical Search Engines; 2014

[4] Pollett, C. "Open Source Search Engine Software!" Open Source Search Engine Software. https://www.seekquarry.com/; 2019.