Chris Pollett > Students >
Shen

    ( Print View)

    [Bio]

    [Blog]


    CS 297

     Proposal

     Deliverables


    CS 298

     Proposal

     CS 298 Presentation - PDF

     CS 298 Report - PDF

     Code

    

CS297 Proposal

Document-Level Machine Translation with Hierarchical Attention

Yu-Tang Shen (yutang.shen@sjsu.edu)

Advisor: Dr. Chris Pollett

Description:

This research aims to improve the coherence of document-level translation: although sequence to sequence models have greatly improved the quality and coherence of machine translation, improvements on readability of document-level translation are wanted. To evaluate the improvement, previous machine translation techniques, such as rule-based machine translation, statistical machine translation, and neural machine translation, including RNN and attention models, will be experimented to serve as a baseline performance.

Schedule:

Week 1: Aug 22 - Aug 28 Complete proposal
Week 2, 3: Aug 29 - Sep 4 Research on previous technologies on rule-based and statistical machine translation (reference 1, 8 - 12)
Week 3: Sep 4- Sep 11 Research on previous technologies on rule-based and statistical machine translation (reference 1, 8 - 12)
Week 4: Sep 12 - Sep 18 Develop a rule-based machine translator (reference 2)
Week 5: Sep 19 - Sep 25 Develop a statistical machine translator
Week 6: Sep 26 - Oct 2 Develop a statistical machine translator
Week 7: Oct 3 - Oct 9 Research on previous neural machine translator implemented with RNN or LSTM (reference 3, 4)
Week 8: Oct 10 - Oct 16 Develop a neural machine translator with RNN or LSTM (reference 5)
Week 9: Oct 17 - Oct 23 Develop a neural machine translator with RNN or LSTM (reference 5)
Week 10: Oct 24 - Oct 30 Fine-tune the neural machine translator
Week 11: Oct 31 - Nov 6 Research on previous attention-based machine translator(reference 6, 7)
Week 12: Nov 7 - Nov 13 Develop a baseline attention-based machine translator
Week 13: Nov 14 - Nov 20 Develop a baseline attention-based machine translator
Week 14: Nov 21 - Nov 27 Fine-tune the baseline attention-based machine translator
Week 15: Nov 28 - Dec 4 Complete CS 297 report draft and visualize results
Week 16: Dec 5 - Dec 11 Complete CS 297 report

Deliverables:

The full project will be done when CS298 is completed. The following will be done by the end of CS297:

1. Implement a rule-based machine translator (End of week 4)

2. Implement a statistical machine translator (End of week 6)

3. Implement a neural machine translator with RNN or LSTM (End of week 10)

4. Implement a baseline attention-based machine translator (End of week 15)

5. Deliver the CS 297 report (End of week 16)

References:

1. [1995] Machine translation: A brief history. W. J. Hutchins. Elsevier Ltd. 1995.

2. [2021] URBANS: Universal Rule-Based Machine Translation NLP toolkit. Truong-Phat Nguyen. GitHub. 2021.

3. [2014] Sequence to Sequence Learning with Neural Networks. Ilya Sutskever, Oriol Vinyals and Quoc V. Le. MIT Press. 2014.

4. [2020] RNN based machine translation and transliteration for Twitter data. M. Vathsala and G. Holi. Springer-Verlag. 2020.

5. [2019] Seq2seq. Yukio Fukuzawa. PyPi. 2019

6. [2017] Attention is all you need. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin. Curran Associates Inc.. 2017.

7. [2020] Big Bird: Transformers for Longer Sequences. Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. Curran Associates Inc.. 2020.

8. [2012] Parallel Data, Tools and Interfaces in OPUS. J. Tiedemann. Proceedings of the 8th International Conference on Language Resources and Evaluation. 2020.

9. [2003] Syntax-based language models for statistical machine translation. E. Charniak, K. Knight and K. Yamada. Proceedings of Machine Translation Summit IX: Papers. 2003.

10. [1993] The Mathematics of Statistical Machine Translation: Parameter Estimation. P. Brown, V. Della Pietra, S. Della Pietra and R. Mercer. Computational Linguistics, vol. 19, pp. 263-311. 1993.

11. [1997] Statistical Techniques for Natural Language Parsing. E. Charniak. AI Magazine. 1997.

12. [2002] A decoder for syntax-based statistical MT. K. Yanada and K. Knight,. ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002.

13. [2005] The Second International Chinese Word Segmentation Bakeoff. T. Emerson. Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing. 2005.