Chris Pollett > Students > Shailesh

    (Print View)

    [Bio]

    [Project Blog]

    [CS297 Proposal]

    [CS297 Presentation-PDF]

    [CS297 Part of Speech Tagging Code]

    [CS297 Cosine Ranking Code]

    [CS297 Wordnet Code]

    [CS297 Report-PDF]

    [CS298 Proposal]

    [CS298 Presentation-PDF]

    [CS298 Project Report-PDF]

                          

























CS297 Proposal

Incorporating WordNet in an Information Retrieval System

Shailesh Padave (shaileshpadave49@gmail.com)

Advisor: Dr. Chris Pollett

Description:

Wordnet is a large lexical database of English language. George A. Miller began this project at Princeton University. Wordnet groups english words into set of synonyms called synsets;provides short, general definitions and records the various semantic relations between the synonym sets. Yioop is a PHP search engine and open source.Yioop! can be configured as either a general purpose search engine for the whole Web or it can be configured to provide search results for a set of URLs or domains. Query rewriting algorithms can be used as a form of query expansion, by combining the user's original query with automatically generated rewrites. In this project, we are going to use WordNet to implement different query rewriting techniques for the Yioop! search engine. We systematically analyse the performance of these techniques versus existing search results in Yioop!.

Schedule:

Week 1: (Aug.26-Sep.1)Deliverable #1 : Project Proposal
Week 2,3: (Sep.2-15)Deliverable #2 : Download WordNet and understand its internal structure
Week 4,5: (Sep.16-29)Explain working og WordNet with few examples
Week 6,7: (Sep.30-Oct.13)Code for separating sentences and words from Wordnet output
Week 8,9: (Oct.14-27)Deliverable #3 : Code for Part of Speech Tagging
Week 10,11: (Oct.28-Nov.3)Deliverable #4 :Code for Cosine Similarity Ranking.
Week 12: (Nov.4-10)Code for Intersection method
Week 13: (Nov.11-24)Integrate Similarity Ranking Algorithms with WordNet code and find out synonyms from senses
Week 13: (Nov.25-Dec.1)First Draft of CS297 Masters Project Report
Week 14,15: (Dec.2-8)Deliverable #5 : CS297 final report

Deliverables:

The full project will be done after CS298 is complete. The following would be estimated by the end of CS297:

1. WordNet would have been downloaded from the website and installed on the local system.

2. Code for 3 different techniques of query rewriting and find out their performance.

3. Examples would be noted down that will explain how WordNet will answer the different kind of queries.

4. WordNet file format would be read and added to Yioop! Or a method to be found to read it in Yioop!.

5. PHP script would be written to read WordNet format so that Yioop! can access the wordnet files.

6. As data is so small, from computation point of view, figure out the method that would be more efficient in memory cache or file cache.

7. CS297 final report

References:

1) George A. Miller (1995). WordNet: A Lexical Database for English. Communications of the ACM Vol. 38, No. 11: 39-41

2) Christiane Fellbaum (1998, ed.) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press

3) Wikipedia : http://en.wikipedia.org/wiki/WordNet

3) http://nlp.stanford.edu/IR-book/pdf/09expand.pdf