CS 297 PROPOSAL

A SCALABLE SEARCH ENGINE AGGREGATOR

Pooja B. Mishra (pooja192009@gmail.com)

Advisor : Dr. Chris Pollett

Description :

Yioop is a PHP search engine developed by Dr. Pollett. Certain search engine tasks such as updating news feeds, rss feeds, sending out notifications are done periodically in bulk. Currently, these tasks are not done periodically in Yioop. I will work on building aggregator to periodically update group notifications, news feeds. This aggregator will be distributed over all machines in the Yioop instance. My tasks will include working on different features of Yioop either to modify them or to enhance them by first doing some experimentation with its current load capacity and architecture.

Deliverables:

As part of CS 297, I will produce the below deliverables.
1. Get familiar with the current feature of news updater and experiment with the current news feed features in the search engine. This will include :

  • Download search engine and install on the local machine.

  • Do a test crawl and get the news updater working by using git clone.


  • 2. Currently news feed feature is running on only one of the six dedicated server machines. This news feed feature needs to be improved. As part of this deliverable, I will
  • Experiment with the capacity of news feed to see how many news feed it updates in an hour

  • Try to test the news feature with different numbers of search sources until it fails. Apache bench will be used to fire the queries and see how it performs over different number of records.


  • 3. Currently, the mpg4 format does not work in Mozilla firefox browser.
  • To overcome this, I will add feature like ffmpg recoding to the news updater.


  • 4. CS 297 report approximately 10 pages containing the above mentioned deliverable results and also any other useful findings. This will take couple of weeks. The technologies that we will be working on will mainly include PHP , JavaScript , MYSQL, DB2 and PostgreSQL .

    Timetable:

    The timeline for above mentioned deliverables is given below:

    Due date Deliverable
    Tuesday, 9 September 2014 Read and learn about Yioop search engine news updater
    Tuesday, 16 September 2014 Deliverable #1 : Install and do test crawl on news updater
    Tuesday, 23 September 2014 Learn about the news feed features of search engine and how to distribute an application over distributed environment
    Tuesday, 30 September 2014 Deliverable #2: Experiment with news feed feature and its capacity.
    Tuesday, 14 October 2014 Learn about ffmpg recoding feature and read about it
    Tuesday, 21 October 2014 Deliverable #3 : Code for incorporating ffmpg recording feature
    Tuesday, 25 November 2014 Deliverable #4 : CS 297 Final Report

    List of references:

    http://linuxers.org/tutorial/ffmpeg-tutorial-beginners
    http://www.cs.usask.ca/faculty/eager/loadsharing.pdf
    http://www.computer.org/portal/web/ds/home
    http://scalingsystems.com/2011/09/07/reading-list-for-distributed-systems/