Chris Pollett > Students >
Tim

    ( Print View)

    [Bio]

    [Blog]

    [CS 297 Proposal]

    [CS 298 Proposal]

    [Deliverable 1]

    [Deliverable 2]

    [Deliverable 3]

    [Deliverable 4]

    [Relevance Ranking(JRFL) slides - pdf]

    [Relevance Ranking(cluster) slides - pdf]

    [CS 297 Report.pdf]

    [CS 298 Report.pdf]

    [CS 298 Slides - pdf]

Word Tracker to Display Trending Words on Yioop

Description:</ br> In order to further my understanding of how Yioop works as well as get some on hands experience with the system, I was tasked to develop some feature to be added to Yioop. For this deliverable, I added a new tool, tentatively called the word tracker. In order to develop word tracker, it was necessary to understand a few thing: the how Yioop updated its index, how data was temporarily stored, and how to subsequently read that data and display on a web page. By the end of this deliverable, I should have good knowledge on how index construction happens, how to add data to the existing database, and how to add a new page to Yioop.

The first part of the process begins in MediaJob, which in turn calls FeedUpdateJob. FeedUpdateJob is a process that runs every hour to update the index shard. New documents are obtained from a predeterminded source list, and their contents are added into the database. After all the documents have been parsed, the job will rebuild the feed shard looking only at fresh documents which, in our case, are those which at less than a week old. After rebuilding the shard, old items are pruned. To add my functionality for word tracking, I have it so that the word occurrences are saved into a new table within the database. Here, stop words are filtered out before also getting stemmed in order to avoid redundant words. The table keeps track the top hourly, daily, and weekly 25 words. The top hourly words are the simply the top words for this run of FeedUpdateJob, but the top daily and weekly words are calculated using the past 24 hours and the past 7 days worth of data respectively. Once this is calculated and saved, old data is deleted from the table.

Now that we have actual data to work with, we need some place to view it. Yioop renders its webpages using a basic view page as the framework, and then rendering individual elements on top of that view. Each element should have the necessary data passed into it using a corresponding model. In my case, I wrote a TrendingModel which pulls the necessary data from the database and passes to a TrackerElement which renders on top of the SearchView. I also reworked the SearchController and main index file such that you can reach the word tracker tool from "yioop.com/trending". Alternatively, one can reach it from the tools page located on the footer.

Text_description

New trending tool option added to the tools page.

When viewing each of the tables, you might notice that the words are actually links. Clicking on any of the words brings you to a search page where the query is the term. The idea is that it would be useful for people to not only know what the top trending words are, but also quickly reach the results that have this term.

Text_description

A snippet of the trending page.

List of files that were modified or added for this to work:

  • index.php - Added /trending route which redirects to the correct page.
  • SearchView.php - Modified so that we check if an additional 'TRENDING' flag is marked when loading up a SearchView page. If it is and the 'MORE' and 'PAGES' flags aren't marked, then the trending page is rendered.
  • MoreOptionsElement.php - Added new link 'Trending' which directs user to the trending tool page.
  • TrackerElement.php - The main page for the new Trending tool.
  • ProfileModel.php - Modified to have an additional 'TOP_WORDS' table added to the database to store the trending words, occurrences, and timestamps.
  • TrendingModel.php - Necessary to get items from the database.
  • SearchController.php - Added new activity 'trending' where if the current activity is 'trending', then we'll run TrendingModel to get the relevant items from the database.
  • FeedUpdateJob.php - Modified to count hourly word occurrences and store them into the database during the feed shard rebuild time. Also recalculates the top daily and weekly occurrences at the same time if the conditions are met.