CS297 Proposal

Question Answering System

Niravkumar Patel (niravkumar.patel1989@gmail.com)

Advisor: Dr. Chris Pollett

Description:

Yioop is an open source search engine developed and managed by Dr. Christopher Pollett. Currently, when a query is searched, it suggests relevant documents based on the query. A summarizer is a process which extracts creates a short summary from a potentially long text document. The Yioop crawler when processing pages runs a summarizer and then only index the contents of the summary it produces. There are times when a user queries or tries to search for specific information. So the information in this summary part can be used to answer those queries. However, the summary itself might not have sentences arranged as question answer pairs.

I will work on adding a new module called Question - Answering System which will extract the information stored in the summary and modify Yioop so that the summary data can be used to answer questions.By implementing various functionalities of natural language processing, Question-Answering system should be efficient enough to identify user's query and answer those queries from the available data.

Schedule:

Week 1: 04 FEB 2015 - 10 FEB 2015Understanding the Question Answering system, how it works in real world scenario. What issues and approaches are commonly used to develop the system?
Week 2: 11 FEB 2015 - 17 FEB 2015Deliverable 1: To create a set of Question-Answers from the summaries that a user might ask. A test set will be helpful to evaluate Question-Answering System.
Week 3: 18 FEB 2015 - 24 FEB 2015Research on the already implemented stemmers as well as the stemmers being used in Yioop.
Week 4,5: 25 FEB 2015 - 10 MAR 2015Deliverable 2: Implementation of stemmer for Portuguese language: Implementation should be in sync with the existing code base of Yioop, so that the stemmer module can be merged and used by the system.
Week 6: 11 MAR 2015 - 16 MAR 2015Detailed analyses of the approach of Question-Answering System explained in Integrating Web-based and Corpus-based Techniques for Question Answering and Gathering Knowledge for a Question Answering System from Heterogeneous Information Sources
Week 7: 17 MAR 2015 - 23 MAR 2015Research and design of algorithms for a stand alone Question-Answering System. Read how recursive descent parser helps building the tree from the sentence.
Week 8,9: 24 MAR 2015 - 07 APR 2015Deliverable 3: Generate a parse tree from a statement annotated by the part of speech tagger for Triplet Extraction
Week 10: 08 APR 2015 - 14 APR 2015Learn triplet extraction algorithm explained in TRIPLET EXTRACTION FROM SENTENCES
Week 11,12: 15 APR 2015 - 28 APR 2015Deliverable 4: Extract triplet from the parse tree generated
Week 13: 29 APR 2015 - 12 MAY 2015Deliverable 5: CS 297 Report

Deliverables:

The project would be considered as done when CS 298 is completed. The following will be completed by the end of CS 297:

1. A test set for Question-Answering System

2. A stemmer for Portuguese Locale

3. A basic implementation of parse tree generation

4. A basic implementation of extraction of triplet

5. CS 297 Report

References:

Yioop Documentation: Yioop Documentation

[2015] Portuguese stemming algorithm. Portuguese Stemmer 2015

[2015] Information Extraction From Text Information Extraction From Text by Steven Bird, Ewan Klein, and Edward Loper 2015

[2007] Triplet Extraction From Sentences by Delia Rusu*, Lorand Dali*, Blaž Fortuna°, Marko Grobelnik°, Dunja Mladenić°. 2007.

[2003] Integrating Web-based and Corpus-based Techniques for Question Answering by Boris Katz, Jimmy Lin, Daniel Loreto, Wesley Hildebrandt, Matthew Bilotti, Sue Felshin, Aaron Fernandes, Gregory Marton, Federico Mora. TREC 2003.

[2001] Gathering Knowledge for a Question Answering System from Heterogeneous Information Sources by Boris Katz and Jimmy Lin and Sue Felshin. ACL Workshop 2001.