CS298 Proposal

Incorporating WordNet in an Information Retrieval System

Shailesh Padave (shaileshpadave49@gmail.com)

Advisor: Dr. Chris Pollett

Committee Members : Dr. Sami Khuri, Prof. Ronald Mak

Abstract:

Wordnet is a large lexical database of English language. Wordnet groups English words into set of synonyms called synsets; provides short, general definitions and records the various semantic relations between the synonym sets. Yioop is an open-source, distributed crawler and search engine written in PHP. Search engine query rewriting algorithms automatically generate rewrites of a user's query. In this project, we are going to use WordNet to implement different query rewriting techniques for the Yioop search engine. We will systematically analyze the performance of these techniques versus existing search results in Yioop. In CS297, we learnt the internal workings of WordNet and implemented the cosine ranking algorithm, the intersection method, and part-of-speech tagging. In CS298, we are going to implement a WordNet-based, word selection algorithm and use it in a revamped query rewriter for Yioop.

CS297 Results :

  • Studied an internal structure of WordNet with the help of some examples.
  • Studied the commands used in WordNet for different output format.
  • Studied use of the part-of-speech tagging in our project.
  • Implemented and compared the cosine similarity ranking algorithm with an intersection method and selected the best fit for our project.
  • Find out the suitable method to integrate an output of WordNet in Yioop.
  • CS297 Report

Proposed Schedule:

Week 1: (Jan 28 - Feb 4)Deliverable #1 : Getting CS298 proposal approved from Professor and upload it
Week 2,3: (Feb 5 - Feb18)Deliverable #2 : Find out the technique for integrating the part-of-speech tagging in Yioop.
Week 4,5: (Feb 19 - Mar 4)Deliverable #3 :Search some possible way to get the output from WordNet for the stemmed word provided by Yioop from summary.
Week 6,7: (Mar 5- Mar 18)Deliverable #4 :Find out the synonym from WordNet output for the given word respective to the part-of-speech tagging for that word in the sentence.
Week 8,9,10: (Mar 19- Apr 8)Deliverable #5 :Perform query expansion on the input query from user.
Week 11,12: (Apr 9 - Apr 22)Deliverable #6 :Compute the effectiveness of query expansion by comparing relevance of output.
Week 13,14: (Apr 23 - May 6)Deliverable #7 :Work on CS298 Report
Week 15: (May 6- May 10)Submit Report to advisor and committee.
Week 16: (May 10- May 16)Defense

Key Deliverables:

The following would be estimated by the end of CS298:

  • Software
    • Find out the technique for integrating the part-of-Speech tagging in Yioop.
    • Search some possible way to get the output from WordNet for the stemmed word provided by Yioop from summary.
    • Find out the synonym from WordNet output for the given word respective to the part-of-speech tagging for that word in the sentence.
    • Perform query expansion on the input query from user.
    • Compute the effectiveness of query expansion by comparing relevance of output.
    • CS298 final report
  • Reports
    • CS298 Project Report
    • Project Code Documentation

Innovations and Challenges:

  • Reading the output from WordNet and find a suitable technique to use generated output in Yioop.
  • Find out the correct sense for each word in the query expansion technique.

References:

[1] WordNet: A lexical database for English Retrieved on Dec 1, 2013, from website: http://wordnet.princeton.edu/

[2] Wikipedia for WordNet Retrieved on Dec 1, 2013, from website: http://en.wikipedia.org/wiki/WordNet

[3] Yioop Documentation from SeekQuery Retrieved on Dec 1, 2013, from website: https://seekquarry.com/?c=main&p=documentation#overview

[4] Improving Query Expansion Using WordNet Retrieved on Dec 10, 2013, from website: http://arxiv.org/abs/1309.4938