Question Answering System
This deliverable is about a Python implementation of the Question Answering (QA) system designed in CS297 deliverable 4.
Dependency
- Python2.7
- tensorflow1.2
- nltk
- tqdm
Description
preprocess.py : Downloading raw data from the website, tokenizing raw data, making vocabulary, converting tokens to token ids, and making small size embedding matrix.
util_data.py : Padding passage token ids, question token ids, and adjusting answer span given max_pass_length and max_question_length; preparing feed ready data for trainning.
model.py : Building Tensorflow graph, trainning Tensorflow graph, validating Tensorflow graph and testing Tensorflow graph.
train.py : Calling functions in util_data.py to prepare feed ready data, then using model.py to train, valid and test Tensorflow graph.
evaluate_v_1_1.py : Provided by SQuAD official website to calculate f1 score and exact match score.
How to Run the Code
If you use a local machine, then running the code is very straightforward. Just enter the code folder, then run
- python preprocess.py
- python train.py local
Result
The result after 1 epoch looks promising. The validation f1 score is around 20%, and the validation exact match score is also around 20%.
The validation scores still keep increasing by the end of the first epoch. This means no overfitting happens in epoch 1.
In deliverable 2 and 3, we will do various experiments, including trainning on more epochs, to persue better results.
|