Chris Pollett > Students > Nicole

    Print View

    [Bio]

    [Blog]

    [CS 297 Proposal]

    [Del 1-Example Program]

    [Del 2-Introduction to Word Embedding]

    [Del 3-Data Preprocessing Program]

    [CS 297 Report_PDF]

    [CS 298 Proposal]

























Project Blog

Summary of the Meeting on Sep 26, 2017

Discussed the plan of implementation: Start with a small set of wikipedia data to preprocess data and build the model in the following 2-3 weeks.

Plans for the following week.

  • Build a Skip Gram model.
  • Preprocess data.


Summary of the Meeting on Sep 19, 2017

I proposed an approach of "Related entites retrieval" for review. Dr. Pollett and I also brain stormed on the approach of "Word sence determination". We decided to work on the topic of "Word Sence Determination".

  • Approach of "Related Entites Retreival"

    • Step1: learn the embedding of each Entity.

      Using a supervised learning. A composition of title T and a set of other titles, which appear in Ts page is a positive sample, labeled as 1. The negative sample is a randomly created composition as above, labeled as 0

      The hypothesis is defined as

      f (Vt * W * Vc )

      • Vt is the transpose of the title Ts vector.

      • Vc is the vector of the set of other titles, which appear in Ts page. It is the sum of all other titles vectors. The weighted sum or TF might be used while calculating this.

      • W is the weight vector.

      • * means dot product.

      • f is a sigmoid function.

      We need to learn both Vt, Vc and W by training the model.

    • Step2: On the top of the embedding, find related entities.

      Score each entities R in the vocabulary except query Q by calculating the similarity (such as simple dot product, or more concisely, cosine similarity) between the vector Vq and Vr. Vq is the vector of the query Q. Vr is the vector of related entity R. Based on the score, the system gives the top K entities.

  • Approach of "Word Sense Determination"

    • Step1: learn the embedding.

    • Step2: clustering the embedding of context words of ambiguous words.

Plans for the following week.

  • Start implemenation of "Word Sense Determination".
  • Keep working on the "Word Sense Determination" approach.


Summary of the Meeting on Sep 12, 2017

Discussed on paper "A Neural Probabilistic Language Model" and "Efficient Estimation of Word Representations in Vector Space". We discussed on Skip Gram model and CBOW model.

Plans for the following week.

  • Work out the approaches for "Related Entities Retreival" and "Word sence determination".


Summary of the Meeting on Sep 5, 2017

Reviewed CS298 Proposal

Plans for the following week.

  • Read the paper "A Neural Probabilistic Language Model".


Summary of the Meeting on May 16, 2017

Reviewed CS297 Report

Plans for the following week.

  • Finalize CS297 Report


Summary of the Meeting on May 9, 2017

Reviewed CS297 Report

Plans for the following week.

  • updata CS297 Report
  • validata all web pages


Summary of the Meeting on May 2, 2017

Pages of ambiguous word has been extracted and output

Plans for the following week.

  • Start CS297 Report


Summary of the Meeting on April 25, 2017

Preprocessed Wikepedia data: Extracted all disambiguation pages.

Plans for the following week.

  • Continue preprocessing Wikipedia data.


Summary of the Meeting on April 18, 2017

Preprocessed Wikepedia data: Extracted all disambiguation pages.

Plans for the following week.

  • Continue preprocessing Wikipedia data.


Summary of the Meeting on April 11, 2017

Made a presentation on introduction to word embedding

Plans for the following week.

  • Continue literature review of word embedding
  • Start preprocessing Wikepedia data


Summary of the Meeting on April 4, 2017

Did a demo on a TensorBoard practice.

Plans for the following week.

  • Literature review of word embedding
  • A presentation of word embedding


Summary of the Meeting on March 28, 2017

Spring break week. No meeting



Summary of the Meeting on March 21, 2017

Machine Learning course week 6 and 10 finished. Did a demo on a TensorFlow machine learning program with different parameters.

Plans for the following week.

  • Add TensorBoard visiulization to program.
  • Literature review on natural language model


Summary of the Meeting on March 14, 2017

Machine Learning course week 5 finished. The assigned Python program finished.

Alter the schedule of following weeks to focus on algorithms and technologies related with the project

Plans for the following week.

  • Machine Learning course week 6 and week 10
  • A program practice on TensorFlow: Experiment handwritten digit recognization with different parameters using TensorFlow


Summary of the Meeting on March 7, 2017

TensorFlow GetStarted finished. Machine Learning course week 3 and week 4 finished

Plans for the following week.

  • Machine Learning course week 5 and week 6
  • A python program
  • A program practice on TensorFlow: Recognize digit using TensorFlow


Summary of the Meeting on Feb 28, 2017

TensorFlow installed. Machine Learning course week 1 and week 2 finished

Plans for the following week.

  • Machine Learning course week 3 and week 4
  • Start programming in TensorFlow


Summary of the Meeting on Feb 21, 2017

Proposal approved.

Plans for the following week.

  • Machine Learning course week 1 and week 2
  • Install TensorFlow on Mac


Summary of the Meeting on Feb 14, 2017

Dr. Pollett reviewed my proposal, and disussed tasks in the following week.

  • Fill out the bio, blog and proposal pages
  • Download wikepedia disambiguation dataset
  • Understand the data format.