Chris Pollett > Students >
Sandhya

    ( Print View )

    [Bio]

    [Project Blog]

    [CS297 Proposal]

    [Autosuggest-PDF]

    [English dictionary Trie]

    [Google Autosuggest]

    [Autosuggestion in Yioop]

    [Multi-word Autosuggest]

    [CS297 Report-PDF]

    [CS298 Proposal]

    [Autosuggest for foreign languages]

    [CS298 Project Report-PDF]

    [CS298 Presentation-PDF]

    [Project Code-ZIP]

                          

























CS298 Proposal

Yioop! Introducing autosuggest and spell check

Sandhya Vissapragada (sandyvissa@gmail.com)

Advisor: Dr. Chris Pollett

Committee Members: Dr. Sami Khuri and Dr. Robert Chun

Abstract

The project aims to incorporate the features of autosuggest, autocomplete and spell check suggestions to the queries in Yioop, a PHP-based search engine. This would help the user in reducing the typing work, catch any spelling mistakes, or repeat any search. Popular commercial search engines run on multiple machines and use popular query lists from their entire user base, so that the user sees autosuggest results pop up while typing. Efficient storage of data on multiple servers is responsible for minimum response times. Yioop typically runs on less number of machines when compared to any commercial search engines. This project aims to implement these computationally intensive functionalities in such a constrained environment, without increasing any load on the Yioop server. This is achieved by performing any needed processing on the client machine without sending queries to the Yioop server. The basic autosuggest functionality is already implemented. New features will be added to support foreign languages, correct any spelling errors in the query and use past user queries to give better results in the suggestion.

CS297 Results

  • Created a trie of English Dictionary words and conducted experiments to figure out the best way to store and send the trie over the network
  • Performed experiments on Google autosuggest
  • Developed the autosuggest functionality for Yioop using trie of English dictionary words
  • Enhanced the autosuggest feature by adding multi-word suggest and arrow-key scroll

Proposed Schedule:

Week 1: Sep.4-11Discuss the project in detail with the advisor
Week 2-3: Sep.11-25 Deliverable 1: Code a new feature to the existing functionality to accept foreign inputs as queries (non-ASCII)
Week 3-4: Sep.25-Oct.9Deliverable 2: Develop a feature to suggest meaningful phrases by storing past queries in the local storage without relying on past queries of other Yioop users
Week 5-6: Oct.9-23Deliverable 3: Develop a new feature to support cross-character set input. This involves listing non-roman suggestions for romanized queries
Week 7-8: Oct.23-Nov.6Deliverable 4: Develop a new feature to correct the spelling errors in the query and suggest the appropriate query to the user
Week 9-10: Nov.6-20Work on CS298 Report
Week 11-12: Nov.27-Dec.4CS298 Report first draft- Submit to Advisor and Committee
Week 13-14: Dec.4-11CS298 Report final document- Submit to Advisor and Committee
Week 15: Dec.11-18Defense

Deliverables:

  • Software
    • New feature to the existing functionality to accept foreign inputs as queries (non-ASCII)
    • Feature to suggest meaningful phrases by storing past queries in the local storage without relying on past queries of other Yioop users
    • New feature to support cross-character set input which involves listing non-roman suggestions for romanized queries
    • New feature to correct the spelling errors in the query and suggest the correct query to the user
  • Report
    • CS298 Report
    • Project Code and Test results Documentation

Innovations and Challenges

  • Working with unknown foreign languages is a challenge as they are new, time consuming and involves rigorous testing
  • Techniques to cleverly suggest meaningful phrases as there is no past query information to rely on, is a challenge
  • The new feature in this project is that, relying on Internet-tracked search queries of other Yioop users is avoided and the whole processing is done on client machine which improves the response time
  • This technique of autosuggestion works well when there is no source of past queries to rely on

References:

[1] Information Retrieval: Implementing and Evaluating Search Engines. Stefan Battcher, Charles L. A. Clarke and Gordon V. Cormack. The MIT Press. 2010.

[2] Information Retrieval: Searching in the 21st Century. Ayse Goker, John Davies. John Wiley and Sons. 2009.

[3] Methods and systems for implementing auto-complete in a web page. United States Patent No. 7185271 B2. 2007.

[4] Source: http://www.norvig.com/spell-correct.html, , Retrieved August 28, 2012.