Chris Pollett > Students > Li

    Print View



    [CS 297 Proposal]

    [Del 1: Back Propagation]

    [Del 2: Word Embedding]

    [Del 3: Setup]

    [Del 4: QA System Architecture]

    [CS297Report [PDF]]

    [CS 298 Proposal]

    [Del 6: QA System]

    [CS298Report [PDF]]

    [Oral Defense Slides[PDF]]

Project Blog

Dec 5

Discuss training details and model details of match-lstm

TODO next week: 297 report draft before Dec 12

Nov 28

Discuss how to improve the writing of proposal and deliverable. The deliverable documents should introduce the goal and the whole picture in the beginning. The proposal should give concrete deliverables to achieve.

Discuss the format of report. Page 1 is about goal and report organization. Page 2-9 are about deliverables. Page 10 is about summary and work to be done in the future.

Discuss how to improve an existing model. One option is improving the loss function. One option is to improve the neural net architecture.

Nov 14, 2017

Talked about cosine similarity evaluation of word embedding. Talked about the new topic: question answerer. Have decided to change to the new topic.

TODO next week: Preprocessing Data and Setting up environment

Nov 7, 2017

Talk about the obstacles of converting modern language to classic language. Talk about narrowing down the project to a classic Chinese poem generator. But Dr. Pollett thinks a Chinese poem generator is a little trivial. We talk about other possible ideas, but no agreement is reached yet.

TODO next week: finish cosine similarity evaluation on word feature vector; think about project ideas

Oct 24, 2017

About deliverable 2:
  1. Should add model description, code example, input data description, output data description, instructions for running the code in deliverable 2 document.
  2. Should add regularization and extrinsic evaluation to the model. Also, should tune the parameters to get better result.
About general machine learning techniques:
  1. Use cross validation when dataset is small. Do training and validating on (k-1)/k of the dataset, and to testing on 1/k of the dataset.
  2. Dropout regularization helps to avoid getting large parameters when dataset is limited but number of parameters are large.
  3. When tuning parameters, start from some small but reasonable number. Then use galloping search to find the optimal number.
About two ideas of the text to poetry convertor:
  1. Idea 1: extract keywords from text, use them to make the first line, then translate the previous lines to the current line. This idea could be the baseline of the master project.
  2. Idea 2: summary the text the several sentences, translate each sentence to a poetry line.
    • Hard part 1: How to find the sentences that capture the core ideas of the text from the text.
    • Hard part 2: How to shorted the sentences to some length
    • Hard part 3: How to make different poetry lines related to each other when decoding the hidden state to poetry lines.
    • Hard part 4: How to satisfy the rules of the poetry, such as grammar and tone, when decoding the hidden state to poetry lines.

TODO next week: Study RNN and CNN models, and some standard code.

TODO the week after next week: Try to get a baseline for deliverable 3.

Oct 17, 2017

About ChinesePeotryGeneration:
  1. Discuss in detail how to generate first line. Make trigrams from corpus. Choose the trigram that both contains input keywords and has high frequency. Use two trigrams to get the 5 characters first line. The last character of one trigram is same with the first character of the second trigram.
  2. Discussed in detail how to generate the first character of non-first line of Chinese 5 quatrain poem. Generate 6 characters in each line. Make the first character of non-first line same with the last character of the previous line.
  3. Discuss the initialization and training of word embedding matrix in poetry generator. Initialize the L matrix with pre-trained embedding matrix from corpus. Treat L matrix as parameters in ChinesePeotryGeneration.
  4. Discuss the relationship among the corpus to get dense word vector, train set of poem generator, and test set of poem generator. Corpus is a super set of train set and test set. Train set and test set should be disjoint.
About schedule adjustment:
  1. Postpone implementing the ChinesePeotryGeneration as deliverable 3 . Choose implementing "A neural probabilistic language model" (Abbreviated as NPLM) as deliverable 2 since it is a preparation for implementing the ChinesePeotryGeneration.
TODO next week:
  1. Implement NPLM.

Oct 10, 2017

  • Went over the current progress in ChinesePeotryGeneration].
  • Discussed the LSTM might also work upon line by line generation / translation.
  • Discussed how to evaluate the first line of a poetry.
  • Discussed the next paper to be implemented. It could relate to text summarization.

TODO next week: Continue implementing ChinesePeotryGeneration

Oct 3, 2017

Present a review on paper "Chinese Poetry Generation with Recurrent Neural Networks"(Abbreviated as ChinesePeotryGeneration). Present a demo of neural network model coded in TensorFlow.

Sep 26, 2017

Go through the differentiation of the loss function of a particular neural network.

Sep 19, 2017

Discuss neural network model back propagation. Discuss word2vec model.

Sep 12, 2017

Work out the proposal. Discuss paper "A Neural Probabilistic Language Model".

Sep 5, 2017

Discuss the main idea of the project. Find several papers to begin with.