Chris Pollett > Students >
Gaikwad

    ( Print View)

    [Bio]

    [Blog]

    [CS 297 Proposal]

    [Comparison between hash2vec and word2vec -pdf]

    [Different Approaches for word2vec from reference paper -pdf]

    [Deliverable 1]

    [Deliverable 2]

    [Deliverable 3]

    [Deliverable 4]

    [Deliverable 5]

    [CS297-report-pdf]

    [CS 298 Proposal]

    [CS298-report-pdf]

CS298 Proposal

Compare English to Hindi text translation using word2vec and hash2vec.

Advisor: Dr. Chris Pollett

Committee Members: Dr. Thomas Austin, Dr. Robert Chun

Abstract:

There is an increasing need to make web data available in all the languages so that people all over the world can understand it. Most web data is still available in English only. The aim of the research project is to study the translation from one language to another . A lot of work has been done to translate web data from one language to another to make it globally available but it is mostly done using word2vec approach only. In this research, we are going to implement machine translation using hash2vec approach and compare its performance in terms of accuracy with word2vec approach. This can be extended to other languages apart from English and Hindi in the future.

Deliverables:

  • Design
    • Design a neural network English text as input and translating them into Hindi. The model would be a Recurrent Neural Network (RNN) containing layers to avoid over-fitting and extract relevant features using various filters.

  • Software
    • Python based machine learning model with English text and translate it into Hindi.
    • Implement algorithm to compare the performance of word2vec with hash2vec models .

  • Report
    • CS 298 report.
    • CS 298 presentation.

Innovations and Challenges:

  • There is very limited information available on internet about hash2vec approach. A lot of experimentation is involved in the process.
  • Developing an architecture that at least gives the accuracy as given by current translation software.

Schedule:

Feb 4-Feb 10Collect dataset containing parallel corpus from English to English
Feb 11-Feb 17Experimentation with hash2vec using linear regression and k-d tree
Feb 18-Mar 2Preprocessing dataset to prepare for both word2vec and hash2vec approach.
Mar 3-Mar 16Implement encoder decoder approach for text translation using tensor-flow.
Mar 17-Mar 30Implement hash2vec approach for text translation.
Apr 7-Apr 20Implement algorithm for comparing performance and accuracy of word2vec and hash2vec.
Apr 21-May 4Write CS298 report and prepare slides

Literature References:

[1] S. Saini and V. Sahula, "A Survey of Machine Translation Techniques and Systems for Indian Languages," in IEEE Int. Conf. on Comp. Int. & Comm. Tech., 2015.
[2] Mahata, Sainik Kumar & Das, Dipankar & Bandyopadhyay, Sivaji. (2018). MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation. Journal of Intelligent Systems. 10.1515/jisys-2018-0016.
[3] S. P. Singh, A. Kumar, H. Darbari, L. Singh, A. Rastogi and S. Jain, "Machine translation using deep learning: An overview," 2017 International Conference on Computer, Communications and Electronics (Comptelix), Jaipur, 2017, pp. 162-167. doi: 10.1109/COMPTELIX.2017.8003957.
[4] E. Charniak, Introduction to Deep Learning, ISBN: 9780262039512192 pp. | 7 in x 9 in75 b&willus. January 2019.
[5] Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio "On the Properties of Neural Machine Translation: Encoder-Decoder Approaches," (Submitted on 3 Sep 2014 (v1), last revised 7 Oct 2014 (this version, v2))
[6] T. Law, H. Itoh and H. Seki, "A neural-network assisted Japanese-English machine translation system," in Proceedings of 1993 Int. Conf. on Neural Networks.