CS298 Proposal
Scalable Search Engine Aggregator
Pooja Mishra (pooja192009@gmail.com)
Advisor: Dr. Chris Pollett
Committee Members: Dr. Sami Khuri, Dr. Robert Chun.
Abstract:
Yioop is a PHP search engine developed by Dr. Pollett.The Yioop search engine is designed to allow users to produce indexes of a web-site or a collection of web-sites. The number of pages a Yioop index can handle range from small site to those containing tens or hundreds of millions of pages. Just like any other search engines, Yioop also consists of media updaters like news updater and features like uploading video sources.It also supports many common features of a search portal such as user discussion group, blogs, wikis, and a news aggregator. Certain search engine tasks such as updating news feeds, rss feeds, sending out notifications are done periodically in bulk. All these different functions are part of the media updater any search engine usually does.Yioop has a news updater process that can be used to re-index RSS and Atom feeds on an hourly basis. This more timely information can then be incorporated into Yioop search results.The list of video and news sites can be configured through the GUI. Yioop has a news_updater process which can be used to automatically update news feeds hourly.
CS297 Results
- Studied the news updater feature of Yioop in detail
- Did manual testing with the capacity of the no of news sources single machine news updater can handle
- Coded the distributed version of news updater(so that it can now run on multiple machines)
- Learnt and understood the FFMPEG tool
- Discussed and proposed the architecture for implementing the video uploader as part of media updater feature of Yioop
Proposed Schedule
Week 1: Jan.27-Feb.02 | Prepare a CS298 Proposal and upload it |
Week 2:Feb.03-09 | Experiment with different User Interface ideas for news updater |
Week 3,4:Feb.03-16 | Deliverable#1:Implement the User Interface for news updater and optimize it |
Week 5:Feb.17-23 | Discuss the already proposed Video uploader architecture |
Week 6,7:Feb.24-Mar.09 | Deliverable#2:Code for the Video uploader |
Week 8:Mar.10-16 | Learn and read about the group feed features and walk through code for sending notification emails |
Week 9:Mar.17-23 | Deliverable#3: Code for implementing aggregation of group feed features raise code review request for the same |
Week 10:Mar.24-30 | Discuss the automated test framework for news updater testing |
Week 11:Mar.31-Apr.06 | Build the framework and get it reviewed |
Week 12:Apr.07-13 | Understand the archival process for the news articles |
Week 13:Apr.14-20 | Implement the automated archival process and raise code review request for the same |
Week 14:Apr.21-27 | Create a first draft of CS298 report |
Week 15:Apr.28-May.04 | Create a final CS298 Report and submit to Advisor and committee members. |
Week 16:May.04-12 | Defense |
Key Deliverables:
- Software
- User Interface for news updater
- Video uploader implementation
- Code for mail distribution
- Report
- CS298 Report
- Project code and test result documentation
Innovations and Challenges
- Experimenting with different User Interface samples and check if any framework could be used
- Optimizing the code for efficiency and keeping track of requesting machine in distributed environment
References:
[2011] PHP for the Web. Ullman, Larry. 2011.
The Algorithm Design Manual, 2nd Edition. Steven Skiena. |