CS298 Proposal

Document-Level Machine Translation with Hierarchical Attention

Yu-Tang Shen (yutang.shen@sjsu.edu)

Advisor: Dr. Chris Pollett

Committee Members: Dr. Thomas Austin, Dr. William Andreopoulos

Abstract:

Current attention-based machine translation models commonly bound the input text length around 1,000 words. Although it is adequate for sentence-level translation, models that can handle longer input texts, such as documents, are needed. This project aims to implement a hierarchical attention model to meet the need. Over the course of this project involves implementing and testing two kinds of hierarchical attention model under two configurations: the default attention mechanism -- full attention and the Big bird attention mechanism will be implemented, while bi-directional translations between English and Chinese will be tested.

CS297 Results

Developed a rule-based machine translation application
Developed a statistical machine translation application
Developed a LSTM machine translation application
Developed an attention machine translation application

Proposed Schedule

Week 1	Jan 31 - Feb 7	Meeting with advisor and finish proposal
Week 2	Feb 8 - Feb 14	Research on [1] and [5] and start working on implementing a hierarchical attention model (full attention)
Week 3	Feb 15 - Feb 21	Implement a hierarchical attention model (full attention)
Week 4	Feb 22 - Feb 28	Wrap up the hierarchical full attention model and research on changing full attention to Big bird attention
Week 5	Mar 1 - Mar 7	Implement a hierarchical Big bird attention model
Week 6	Mar 8 - Mar 14	Implement a hierarchical Big bird attention model
Week 7	Mar 15 - Mar 21	Implement a hierarchical Big bird attention model
Week 8	Mar 22 - Mar 28	Implement a hierarchical Big bird attention model
Week 9	Mar 29 - Apr 4	Wrap up the hierarchical Big bird attention model and start preparing dataset for Chinese to English translation
Week 10	Apr 5 - Apr 11	Train a hierarchical Big bird attention model for Chinese to English translation
Week 11	Apr 12 - Apr 18	Fine-tune all the models (buffer week)
Week 12	Apr 19 - Apr 25	Work on CS 298 Report
Week 13	Apr 26 - May 2	Work on CS 298 Report and Presentation
Week 14	May 3 - May 9	Prepare for CS 298 Presentation

Time allocation:

Task	Time allocated
Proposal	1 week
Deliverable 1	2.5 weeks
Deliverable 2	5 weeks
Deliverable 3	2.5 weeks
Report and presentation	3 weeks

Key Deliverables:

Software
- Implement a hierarchical attention model that translates English documents into Chinese with full attention mechanism
- Implement a hierarchical attention model that translates English documents into Chinese with Big Bird attention mechanism
- Implement a hierarchical attention model that translates Chinese documents into English with Big Bird attention mechanism
Report
- CS 298 report -- detailing on the bidirectional translation capability of implemented hierarchical attention models and comparing them between [5]
- CS 298 presentation -- presenting the translation results

Innovations and Challenges

There is no English to Chinese machine translation models that utilizes hierarchical attention mechanism yet ([5] focused on ZH-EN and ES-EN translation with hierarchical attention models)
Big bird attention mechanism has yet been applied to hierarchical attention models ([1][5][7] did not try Big bird attention mechanism)
Big bird attention mechanism has not been tested on Chinese to English translation tasks (innovation 1 and 2 combined)

References:

[1] Yang, Zichao, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. "Hierarchical attention networks for document classification." in Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. 2016.

[2] Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. "Attention is all you need." in Advances in neural information processing systems 30. 2017.

[3] Tian, Liang, Derek F. Wong, Lidia S. Chao, Paulo Quaresma, Francisco Oliveira, and Lu Yi. "UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation," in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). 2014.

[4] Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. "Big bird: Transformers for longer sequences." in Advances in neural information processing systems 33. 2020

[5] Lesly Miculicich, Dhananjay Ram, Nikolaos Pappas, and James Henderson. "Document-level neural machine translation with hierarchical attention networks." in arXiv. 2018.

[6] Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. "Bleu: a method for automatic evaluation of machine translation." in Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 2002.

[7] Pappas Nikolaos, and Andrei Popescu-Belis. "Multilingual hierarchical attention networks for document classification." in arXiv. 2017.