Chris Pollett > Students > Li

    Print View

    [Bio]

    [Blog]

    [CS 297 Proposal]

    [Del 1: Back Propagation]

    [Del 2: Word Embedding]

    [Del 3: Setup]

    [Del 4: QA System Architecture]

    [CS297Report [PDF]]

    [CS 298 Proposal]

    [Del 6: QA System]

    [CS298Report [PDF]]

    [Oral Defense Slides[PDF]]

























Question Answering System

QA System - zip

This deliverable is about a Python implementation of the Question Answering (QA) system designed in CS297 deliverable 4.

Dependency

  • Python2.7
  • tensorflow1.2
  • nltk
  • tqdm

Description

  • preprocess.py: Downloading raw data from the website, tokenizing raw data, making vocabulary, converting tokens to token ids, and making small size embedding matrix.
  • util_data.py: Padding passage token ids, question token ids, and adjusting answer span given max_pass_length and max_question_length; preparing feed ready data for trainning.
  • model.py: Building Tensorflow graph, trainning Tensorflow graph, validating Tensorflow graph and testing Tensorflow graph.
  • train.py: Calling functions in util_data.py to prepare feed ready data, then using model.py to train, valid and test Tensorflow graph.
  • evaluate_v_1_1.py: Provided by SQuAD official website to calculate f1 score and exact match score.

How to Run the Code

If you use a local machine, then running the code is very straightforward. Just enter the code folder, then run

  1. python preprocess.py
  2. python train.py local

Result

The result after 1 epoch looks promising. The validation f1 score is around 20%, and the validation exact match score is also around 20%. The validation scores still keep increasing by the end of the first epoch. This means no overfitting happens in epoch 1. In deliverable 2 and 3, we will do various experiments, including trainning on more epochs, to persue better results.