Question Answering System

QA System - zip

This deliverable is about a Python implementation of the Question Answering (QA) system designed in CS297 deliverable 4.

Dependency

Python2.7
tensorflow1.2
nltk
tqdm

Description

preprocess.py: Downloading raw data from the website, tokenizing raw data, making vocabulary, converting tokens to token ids, and making small size embedding matrix.
util_data.py: Padding passage token ids, question token ids, and adjusting answer span given max_pass_length and max_question_length; preparing feed ready data for trainning.
model.py: Building Tensorflow graph, trainning Tensorflow graph, validating Tensorflow graph and testing Tensorflow graph.
train.py: Calling functions in util_data.py to prepare feed ready data, then using model.py to train, valid and test Tensorflow graph.
evaluate_v_1_1.py: Provided by SQuAD official website to calculate f1 score and exact match score.

How to Run the Code

If you use a local machine, then running the code is very straightforward. Just enter the code folder, then run

python preprocess.py
python train.py local

Result

The result after 1 epoch looks promising. The validation f1 score is around 20%, and the validation exact match score is also around 20%. The validation scores still keep increasing by the end of the first epoch. This means no overfitting happens in epoch 1. In deliverable 2 and 3, we will do various experiments, including trainning on more epochs, to persue better results.