Chris Pollett > Students > Qiao

    Print View

    [Bio]

    [Blog]

    [CS 297 Proposal]

    [Del 1-Example Program]

    [Del 2-Introduction to Word Embedding]

    [Del 3-Data Preprocessing Program]

    [CS 297 Report_PDF]

    [CS 298 Proposal]

    [CS 298 Report PDF]

    [CS 298 Presentation PDF]

























CS297 Proposal

Word Sense Determination From Wikipedia Data Using A Neural Net

Qiao Liu (nicole.liuqiao@gmail.com)

Advisor: Dr. Chris Pollett

Description:

Many words have multiple meanings. For example, plant can mean a type of living organism or a factory. Being able to determine the sense of such words is very useful in natural language processing tasks, such as speech synthesis, question answering, and machine translation. As a part of the project, we will use a modular model to classify the sense of words to be disambiguated. This model consisted of two parts: The first part was a neural-network-based language model to compute continuous vector representations of words from data sets created from Wikipedia pages. The second part classified the meaning of the given word without explicitly knowing what the meaning is.

Schedule:

Week 1: Feb. 14 - Feb. 21Discuss on the topics. Download wikipedia dataset and understand the format of disambiguation page.
Week 2: Feb. 21 - Feb. 28Take Coursera Machine Learning course(Week1&2). Install TensorFlow.
Week 3: Feb. 28 - Mar. 7Take Coursera Machine Learning course(Week3&4). Understand TensorFlow basic logic. Study Python.
Week 4: Mar. 7 - Mar. 14Deliverable #1: An example program to run in TensorFlow.

Study Python. Machine learning(Week5).

Week 5: Mar. 14 - Mar. 21Coursera Machine learning(Week6&10). Literature review on neural network language model.
Week 6: Mar. 21 - Apr. 4Study TensorFlow. Literature review on word embedding.
Week 7: Apr. 4 - Apr. 11Deliverable #2: Presentation on word embedding.

Literature review on neural network language model and word embedding

Week 8: Apr. 11 - Apr. 18Extract data from wikepedia and data preprocessing in small scale.
Week 9: Apr. 18 - Apr. 25Extract data from wikepedia.
Week 10: Apr. 25 - May 2Deliverable #3: Data preprocessing program.
Week 11: May 2 - May 9Start working on CS297 Final Report
Week 12: May 9 - May 16Deliverable #4: Complete the CS297 Final Report

Deliverables:

The full project will be done when CS298 is completed. The following will be done by the end of CS297:

1. An example program to run in TensorFlow

2. Presentation on word embedding

3. Data preprocessing program

4. CS297 Final Report: This is the culminating document for this semester's activities. It will include:

4.1 An overview of the project problem

4.2 Summary of approaches of the problem

4.3 Discuss the platform I plan to use

4.4 Discuss the technology I plan to experiment

References:

Christopher Olah, "Deep Learning, NLP, and Representations", http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/

Silviu Cucerzan, "Large-Scale Named Entity Disambiguation Based on Wikipedia Data", 2007

Rada Mihalcea, "Using Wikipedia for Automatic Word Sense Disambiguation", 2007