Chris Pollett > Students > Nicole

    Print View

    [Bio]

    [Blog]

    [CS 297 Proposal]

    [Del 1-Example Program]

    [Del 2-Introduction to Word Embedding]

    [Del 3-Data Preprocessing Program]

    [CS 297 Report_PDF]

    [CS 298 Proposal]

























CS297 Proposal

Word Sense Determination From Wikipedia Data Using A Neural Net

Qiao Liu (nicole.liuqiao@gmail.com)

Advisor: Dr. Chris Pollett

Description:

Many words carry different meanings based on their context. For instance, apple could refer to a fruit, a company or a film. In this project, we will use English Wikipedia dataset as a source of sense annotations and word embedding to train a neural network to determine the sense of word within the given context.

Schedule:

Week 1: Feb. 14 - Feb. 21Discuss on the topics. Download wikipedia dataset and understand the format of disambiguation page.
Week 2: Feb. 21 - Feb. 28Take Coursera Machine Learning course(Week1&2). Install TensorFlow.
Week 3: Feb. 28 - Mar. 7Take Coursera Machine Learning course(Week3&4). Understand TensorFlow basic logic. Study Python.
Week 4: Mar. 7 - Mar. 14Deliverable #1: An example program to run in TensorFlow.

Study Python. Machine learning(Week5).

Week 5: Mar. 14 - Mar. 21Coursera Machine learning(Week6&10). Literature review on neural network language model.
Week 6: Mar. 21 - Apr. 4Study TensorFlow. Literature review on word embedding.
Week 7: Apr. 4 - Apr. 11Deliverable #2: Presentation on word embedding.

Literature review on neural network language model and word embedding

Week 8: Apr. 11 - Apr. 18Extract data from wikepedia and data preprocessing in small scale.
Week 9: Apr. 18 - Apr. 25Extract data from wikepedia.
Week 10: Apr. 25 - May 2Deliverable #3: Data preprocessing program.
Week 11: May 2 - May 9Start working on CS297 Final Report
Week 12: May 9 - May 16Deliverable #4: Complete the CS297 Final Report

Deliverables:

The full project will be done when CS298 is completed. The following will be done by the end of CS297:

1. An example program to run in TensorFlow

2. Presentation on word embedding

3. Data preprocessing program

4. CS297 Final Report: This is the culminating document for this semester's activities. It will include:

4.1 An overview of the project problem

4.2 Summary of approaches of the problem

4.3 Discuss the platform I plan to use

4.4 Discuss the technology I plan to experiment

References:

Christopher Olah, "Deep Learning, NLP, and Representations", http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/

Silviu Cucerzan, "Large-Scale Named Entity Disambiguation Based on Wikipedia Data", 2007

Rada Mihalcea, "Using Wikipedia for Automatic Word Sense Disambiguation", 2007