Chris Pollett >
Students > [Bio] [Blog] [Comparison between hash2vec and word2vec -pdf] [Different Approaches for word2vec from reference paper -pdf] |
CS297 ProposalCompare word2vec with hash2vec for Word Sense Disambiguation on Wikipedia corpusNeha Gaikwad (neha.gaikwad@sjsu.edu) Advisor: Dr. Chris Pollett Description: When a single word has multiple meanings it is difficult for machine to interpret this type of query. For example, Depression can mean a illness, weather system, economics. There are many natural language processing based applications such as semantic analysis, machine translation, speech synthesis, information retrieval, etc. This project focuses on determining the dis-ambiguity of sensed words using modular models. The project is divided into two parts: 1) Convert the Wikipedia dataset into hash and vectorise it using hashing trick. 2) Classify using modular model and then compare the results with the existing work on the exact same problem except the vectors were formed using word2vec. Schedule:
Deliverables: The full project will be done when CS298 is completed. The following will be done by the end of CS297: 1. Word2vec Implementation using tensorflow and genism libraries 2. Calculating similar words using word2vec vectors and cosine similarity 3. Hash2vec Implementation using mh3 hash function 4. Calculating nearby words using euclidean distance on hash2vec vectors 5. Experimentation with hash2vec : Implementation of Hashing Trick i)An Overview ii)Summary of Approaches iii)Platform and technologies details I plan to use. iv)Planning of my experimentation.References: [2016] "Hash2Vec: Feature Hashing for Word Embeddings". Luis Argerich, Matias J. Cano, and Joaquin Torre Zaffaroni. Publisher. 2016. [2017] "Learning to understand phrases by embedding the dictionary". Hill, F., Cho, K., Korhonen, A., Bengio, Y. 2017. [2013] ": Distributed representations of words and phrases and their compositionality In Advances in neural information processing systems". Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean, J. 2013. |