Chris Pollett > Students > Qiao
[Bio] [Blog] [Del 2-Introduction to Word Embedding] [Del 3-Data Preprocessing Program] |
CS297 ProposalWord Sense Determination From Wikipedia Data Using A Neural NetQiao Liu (nicole.liuqiao@gmail.com) Advisor: Dr. Chris Pollett Description: Many words have multiple meanings. For example, plant can mean a type of living organism or a factory. Being able to determine the sense of such words is very useful in natural language processing tasks, such as speech synthesis, question answering, and machine translation. As a part of the project, we will use a modular model to classify the sense of words to be disambiguated. This model consisted of two parts: The first part was a neural-network-based language model to compute continuous vector representations of words from data sets created from Wikipedia pages. The second part classified the meaning of the given word without explicitly knowing what the meaning is. Schedule:
Deliverables: The full project will be done when CS298 is completed. The following will be done by the end of CS297: 1. An example program to run in TensorFlow 2. Presentation on word embedding 3. Data preprocessing program 4. CS297 Final Report: This is the culminating document for this semester's activities. It will include: 4.1 An overview of the project problem 4.2 Summary of approaches of the problem 4.3 Discuss the platform I plan to use 4.4 Discuss the technology I plan to experiment References: Christopher Olah, "Deep Learning, NLP, and Representations", http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/ Silviu Cucerzan, "Large-Scale Named Entity Disambiguation Based on Wikipedia Data", 2007 Rada Mihalcea, "Using Wikipedia for Automatic Word Sense Disambiguation", 2007 |