Chris Pollett > Students > Shenoy

    Print View

    [Bio]

    [Blog]

    [CS 297 Proposal]

    [Improve Q/A Patch]

    [Literature Review]

    [Hindi Part of Speech Tagger]

    [Refactor Code and support Localization]

    [CS 297 Report-PDF]

    [CS 298 Proposal]

    [Del 1: Triplet Extractor]

    [Del 2: Lexicon in Database]

    [Del 3: Best Answer at the Top]

    [Del 4: Add Named Entities]

    [CS 298 Report-PDF]

    [CS 298 Presentation-PDF]

























CS297 Proposal

Improving an Open Source Question Answering System

Salil Shenoy (salil.shenoy@sjsu.edu)

Advisor: Dr. Chris Pollett

Description:

Yioop is an open source search engine developed and managed by Dr. Chris Pollett. Currently, when a query is searched, it suggests relevant documents based on the query.

An existing patch for the Question Answering system was developed by Niravkumar Patel, the existing module supports queries in English. I will work on understanding and improving the existing patch and try to improve the Question Answering system. I intend to leverage natural language processing and enhance the module so that it supports internationalization. The Question Answering system developed should be able to identify the users query and retrieve information efficiently.

Schedule:

Week 1: 06 Sep 2016 - 12 Sep 2016Understand what a Question Answer System
Week 2: 13 Sep 2016 - 19 Sep 2016Read about Ranking mechanism in Yioop. Use the documentation at Patch for Yioop to create and apply a patch
Week 3, 4: 20 Sep 2016 - 3 Oct 2016Work on how to integrate the existing patch into the latest version of Yioop, understand the code from the existing patch
Week 5: 4 Oct 2016 - 9 Oct 2016 Read Gathering Knowledge for a Question Answering System from Heterogeneous Information Sources and Deliverable 1: Get existing patch to work with the current version of Yioop
Week 6: 10 Oct 2016 - 16 Oct 2016Understand working of internationalization, how it works in real world scenarios and explore options to develop it
Week 7: 17 Oct 2016 - 24 Oct 2016Deliverable 2: Literature review for Question Answering System
Week 8: 25 Oct 2016 - 31 Oct 2016 Research and design of algorithms for a stand alone Question-Answering System. Read how recursive descent parser helps building the tree from the sentence.
Week 9,10: 01 Nov 2016 - 14 Nov 2016Deliverable 3: Create a part of speech tagger for a particular language selected (examples: Hindi, Marathi)
Week 11: 15 Nov 2016 - 21 Nov 2016 Learn triplet extraction algorithm explained in Triplet Extraction From Sentences
Week 12, 13: 22 Nov 2016 - 05 Dec 2016Deliverable 4: Refactor the tokenization code for English QA system
Week 14: 06 Dec 2016 - 12 Dec 2016Deliverable 5: CS 297 Report

Deliverables:

The full project will be done when CS298 is completed. The following will be done by the end of CS297:

1. Get existing patch to work with the current version of Yioop.

2. Literature review for Question Answering System.

3. Create a part of speech tagger for a particular language (examples: Hindi, Marathi)

4. Refactor the tokenization code for English QA system

5. CS 297 Report.

References:

Yioop Documentation Yioop Documentation

Question Answer System in Information Retrieval Question Answer System Wiki

[2015] Existing work done on Question Answer System Question-Answer System patch for Yioop by Niravkumar Patel, 2015.

[2015] Information Extraction From Text Information Extraction From Text by Steven Bird, Ewan Klein, and Edward Loper 2015

[2007] Triplet Extraction From Sentences by Delia Rusu, Lorand Dali, Bla Fortuna, Marko Grobelnik, Dunja Mladeni. 2007.

[2003] Integrating Web-based and Corpus-based Techniques for Question Answering by Boris Katz, Jimmy Lin, Daniel Loreto, Wesley Hildebrandt, Matthew Bilotti, Sue Felshin, Aaron Fernandes, Gregory Marton, Federico Mora. TREC 2003.

[2001] Gathering Knowledge for a Question Answering System from Heterogeneous Information Sources by Boris Katz and Jimmy Lin and Sue Felshin. ACL Workshop 2001.

[2005] A Hindi Question Answering system for E-learning documents by Praveen Kumar et.al

[2003] A light weight Hindi stemmer by A. Ramanathan, D. D. Rao

S. Sahu, N. Vasnik, and D. Roy, "PRASHNOTTAR: A HINDI QUESTION ANSWERING SYSTEM," International Journal of Computer Science and Information Technology (IJCSIT), vol. 4, no. 2, pp. 149-158, Apr. 2012.

Hindi Sentence Structure [Online] Available: http://hindilanguage.info/hindi-grammar/syntax/