Chris Pollett >
Students > [Bio] [Blog] CS 297 Deliverables CS 298
|
CS297 ProposalDocument-Level Machine Translation with Hierarchical AttentionYu-Tang Shen (yutang.shen@sjsu.edu) Advisor: Dr. Chris Pollett Description: This research aims to improve the coherence of document-level translation: although sequence to sequence models have greatly improved the quality and coherence of machine translation, improvements on readability of document-level translation are wanted. To evaluate the improvement, previous machine translation techniques, such as rule-based machine translation, statistical machine translation, and neural machine translation, including RNN and attention models, will be experimented to serve as a baseline performance. Schedule:
Deliverables: The full project will be done when CS298 is completed. The following will be done by the end of CS297: 1. Implement a rule-based machine translator (End of week 4) 2. Implement a statistical machine translator (End of week 6) 3. Implement a neural machine translator with RNN or LSTM (End of week 10) 4. Implement a baseline attention-based machine translator (End of week 15) 5. Deliver the CS 297 report (End of week 16) References: 1. [1995] Machine translation: A brief history. W. J. Hutchins. Elsevier Ltd. 1995. 2. [2021] URBANS: Universal Rule-Based Machine Translation NLP toolkit. Truong-Phat Nguyen. GitHub. 2021. 3. [2014] Sequence to Sequence Learning with Neural Networks. Ilya Sutskever, Oriol Vinyals and Quoc V. Le. MIT Press. 2014. 4. [2020] RNN based machine translation and transliteration for Twitter data. M. Vathsala and G. Holi. Springer-Verlag. 2020. 5. [2019] Seq2seq. Yukio Fukuzawa. PyPi. 2019 6. [2017] Attention is all you need. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin. Curran Associates Inc.. 2017. 7. [2020] Big Bird: Transformers for Longer Sequences. Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. Curran Associates Inc.. 2020. 8. [2012] Parallel Data, Tools and Interfaces in OPUS. J. Tiedemann. Proceedings of the 8th International Conference on Language Resources and Evaluation. 2020. 9. [2003] Syntax-based language models for statistical machine translation. E. Charniak, K. Knight and K. Yamada. Proceedings of Machine Translation Summit IX: Papers. 2003. 10. [1993] The Mathematics of Statistical Machine Translation: Parameter Estimation. P. Brown, V. Della Pietra, S. Della Pietra and R. Mercer. Computational Linguistics, vol. 19, pp. 263-311. 1993. 11. [1997] Statistical Techniques for Natural Language Parsing. E. Charniak. AI Magazine. 1997. 12. [2002] A decoder for syntax-based statistical MT. K. Yanada and K. Knight,. ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002. 13. [2005] The Second International Chinese Word Segmentation Bakeoff. T. Emerson. Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing. 2005. |