Chris Pollett > Students > Shenoy

    Print View

    [Bio]

    [Blog]

    [CS 297 Proposal]

    [Improve Q/A Patch]

    [Literature Review]

    [Hindi Part of Speech Tagger]

    [Refactor Code and support Localization]

    [CS 297 Report-PDF]

    [CS 298 Proposal]

    [Del 1: Triplet Extractor]

    [Del 2: Lexicon in Database]

    [Del 3: Best Answer at the Top]

    [Del 4: Add Named Entities]

    [CS 298 Report-PDF]

    [CS 298 Presentation-PDF]

























CS298 Proposal

Improve an Open Source Question Answering System

Salil Shenoy (salil.shenoy@sjsu.edu)

Advisor: Dr. Chris Pollett

Committee Members: Prof. Jenny Lam, Prof. Robert Chun

Abstract:

I am working on improving the question answering system in Yioop. In the earlier phase of my project, I worked on integrating an existing patch of the Question Answering System in Yioop, I refactored the code so that it supports internationalization and made it locale specific which may result in improved efficiency of the question answering module in Yioop. I also implemented a rule based Hindi Part of Speech tagger. In CS298, I will work on creating a parse tree and triplet extractor module for Hindi. I also plan to develop a similar module for another language.

CS297 Results

  • Integrated an existing patch of Question Answering System in Yioop with changes to optimize the patch
  • Implemented a Rule Based Parts of Speech tagger for Hindi
  • Refactored the Question Answering System code so that it supports internationalization and made it locale specific

Proposed Schedule

Week 1: 08 Feb 2017 - 13 Feb 2017Read and present on how to build a parse tree for the Question Answer System and Indian languages.
Week 2,3: 14 Feb 2017 - 21 Feb 2017Deliverable 1: Generate a parse tree for the Hindi locale and Implement a triplet extractor for Hindi
Week 4,5: 01 Mar. 2017 - 14 Mar. 2017Deliverable 2: Store the lexicon in a database
Week 6,7: 15 Mar. 2017 - 28 Mar. 2017Deliverable 3: Improve parsing mechanism to identify the entities
Week 8,9: 29 Mar. 2017 - 12 Apr. 2017 Deliverable 4: Display the best answer that is found at the top in a separate area
Week 10,11: 13 Apr. 2017 - 26 Apr. 2017 Deliverable 5: Add a way to allow user to add entities

Key Deliverables:

  • Software
    • Patch with Parse tree generation for Hindi and Triplet Extraction for Hind
    • Moving the lexicon to a database, improving the parsing mechanism for a locale
    • Display the best answer that is found at the top in a separate area
    • Add a way to allow user to add entities
  • Report
    • CS 298 Report

Innovations and Challenges

  • Creating a robust hindi lexicon to support the Hindi Question Answering Module in Yioop. The first part of the challenge included the task of creating a lexicon in a format which can be used by the Question Answering System in Yioop. The next part is to make sure that words are assigned its most probable part of speech.
  • Creating more rules to assign proper parts of speech to tokens in Hindi sentences. I have used a rule based approach to create the part of speech tagger. The larger the number of rules more the accuracy of the PoS tagger.
  • Understanding the semantics of a question possed in a language other than English.

References:

Amin, Dhiraj, and Sharvari Govilkar. "ARQAS: Augmented Reality based Question Answering System using Ontology in HINDI and MARATHI Language." International Journal of Computer Applications 126.13 (2015).

Judge, J., Cahill, A., & Van Genabith, J. (2006, July). Questionbank: Creating a corpus of parse-annotated questions. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (pp. 497-504). Association for Computational Linguistics.

Jing, H. (2000, April). Sentence reduction for automatic text summarization . In Proceedings of the sixth conference on Applied natural language processing (pp. 310-315). Association for Computational Linguistics.