Chris Pollett >
Students >
Shailesh [Bio] [CS297 Part of Speech Tagging Code] |
CS297 ProposalIncorporating WordNet in an Information Retrieval SystemShailesh Padave (shaileshpadave49@gmail.com) Advisor: Dr. Chris Pollett Description: Wordnet is a large lexical database of English language. George A. Miller began this project at Princeton University. Wordnet groups english words into set of synonyms called synsets;provides short, general definitions and records the various semantic relations between the synonym sets. Yioop is a PHP search engine and open source.Yioop! can be configured as either a general purpose search engine for the whole Web or it can be configured to provide search results for a set of URLs or domains. Query rewriting algorithms can be used as a form of query expansion, by combining the user's original query with automatically generated rewrites. In this project, we are going to use WordNet to implement different query rewriting techniques for the Yioop! search engine. We systematically analyse the performance of these techniques versus existing search results in Yioop!. Schedule:
Deliverables: The full project will be done after CS298 is complete. The following would be estimated by the end of CS297: 1. WordNet would have been downloaded from the website and installed on the local system. 2. Code for 3 different techniques of query rewriting and find out their performance. 3. Examples would be noted down that will explain how WordNet will answer the different kind of queries. 4. WordNet file format would be read and added to Yioop! Or a method to be found to read it in Yioop!. 5. PHP script would be written to read WordNet format so that Yioop! can access the wordnet files. 6. As data is so small, from computation point of view, figure out the method that would be more efficient in memory cache or file cache. 7. CS297 final report References: 1) George A. Miller (1995). WordNet: A Lexical Database for English. Communications of the ACM Vol. 38, No. 11: 39-41 2) Christiane Fellbaum (1998, ed.) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press 3) Wikipedia : http://en.wikipedia.org/wiki/WordNet 3) http://nlp.stanford.edu/IR-book/pdf/09expand.pdf |