CS297 Proposal
Incorporating WordNet in an Information Retrieval System
Shailesh Padave (shaileshpadave49@gmail.com)
Advisor: Dr. Chris Pollett
Description:
Wordnet is a large lexical database of English language. George A. Miller began this project at Princeton University.
Wordnet groups english words into set of synonyms called synsets;provides short, general definitions and records the various semantic relations between the synonym sets.
Yioop is a PHP search engine and open source.Yioop! can be configured as either a general purpose search engine for the whole Web or it can be configured to provide search results for a set of URLs or domains.
Query rewriting algorithms can be used as a form of query expansion, by combining the user's original query with automatically generated rewrites.
In this project, we are going to use WordNet to implement different query rewriting techniques for the Yioop! search engine. We systematically analyse the performance of these techniques versus existing search results in Yioop!.
Schedule:
Week 1:
(Aug.26-Sep.1) | Deliverable #1 : Project Proposal |
Week 2,3:
(Sep.2-15) | Deliverable #2 : Download WordNet and understand its internal structure |
Week 4,5:
(Sep.16-29) | Explain working og WordNet with few examples |
Week 6,7:
(Sep.30-Oct.13) | Code for separating sentences and words from Wordnet output |
Week 8,9:
(Oct.14-27) | Deliverable #3 : Code for Part of Speech Tagging |
Week 10,11:
(Oct.28-Nov.3) | Deliverable #4 :Code for Cosine Similarity Ranking. |
Week 12:
(Nov.4-10) | Code for Intersection method |
Week 13:
(Nov.11-24) | Integrate Similarity Ranking Algorithms with WordNet code and find out synonyms from senses |
Week 13:
(Nov.25-Dec.1) | First Draft of CS297 Masters Project Report |
Week 14,15:
(Dec.2-8) | Deliverable #5 : CS297 final report |
Deliverables:
The full project will be done after CS298 is complete.
The following would be estimated by the end of CS297:
1. WordNet would have been downloaded from the website and installed on the local system.
2. Code for 3 different techniques of query rewriting and find out their performance.
3. Examples would be noted down that will explain how WordNet will answer the different kind of queries.
4. WordNet file format would be read and added to Yioop! Or a method to be found to read it in Yioop!.
5. PHP script would be written to read WordNet format so that Yioop! can access the wordnet files.
6. As data is so small, from computation point of view, figure out the method that would be more efficient in memory cache or file cache.
7. CS297 final report
References:
1) George A. Miller (1995). WordNet: A Lexical Database for English. Communications of the ACM Vol. 38, No. 11: 39-41
2) Christiane Fellbaum (1998, ed.) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press
3) Wikipedia : http://en.wikipedia.org/wiki/WordNet
3) http://nlp.stanford.edu/IR-book/pdf/09expand.pdf |