Chris Pollett > Students >
Shen

    ( Print View)

    [Bio]

    [Blog]


    CS 297

     Proposal

     Deliverables


    CS 298

     Proposal

     CS 298 Presentation - PDF

     CS 298 Report - PDF

     Code

    

CS298 Proposal

Document-Level Machine Translation with Hierarchical Attention

Yu-Tang Shen (yutang.shen@sjsu.edu)

Advisor: Dr. Chris Pollett

Committee Members: Dr. Thomas Austin, Dr. William Andreopoulos

Abstract:

Current attention-based machine translation models commonly bound the input text length around 1,000 words. Although it is adequate for sentence-level translation, models that can handle longer input texts, such as documents, are needed. This project aims to implement a hierarchical attention model to meet the need. Over the course of this project involves implementing and testing two kinds of hierarchical attention model under two configurations: the default attention mechanism -- full attention and the Big bird attention mechanism will be implemented, while bi-directional translations between English and Chinese will be tested.

CS297 Results

  • Developed a rule-based machine translation application
  • Developed a statistical machine translation application
  • Developed a LSTM machine translation application
  • Developed an attention machine translation application

Proposed Schedule

Week 1 Jan 31 - Feb 7 Meeting with advisor and finish proposal
Week 2 Feb 8 - Feb 14 Research on [1] and [5] and start working on implementing a hierarchical attention model (full attention)
Week 3 Feb 15 - Feb 21 Implement a hierarchical attention model (full attention)
Week 4 Feb 22 - Feb 28 Wrap up the hierarchical full attention model and research on changing full attention to Big bird attention
Week 5 Mar 1 - Mar 7 Implement a hierarchical Big bird attention model
Week 6 Mar 8 - Mar 14 Implement a hierarchical Big bird attention model
Week 7 Mar 15 - Mar 21 Implement a hierarchical Big bird attention model
Week 8 Mar 22 - Mar 28 Implement a hierarchical Big bird attention model
Week 9 Mar 29 - Apr 4 Wrap up the hierarchical Big bird attention model and start preparing dataset for Chinese to English translation
Week 10 Apr 5 - Apr 11 Train a hierarchical Big bird attention model for Chinese to English translation
Week 11 Apr 12 - Apr 18 Fine-tune all the models (buffer week)
Week 12 Apr 19 - Apr 25 Work on CS 298 Report
Week 13 Apr 26 - May 2 Work on CS 298 Report and Presentation
Week 14 May 3 - May 9 Prepare for CS 298 Presentation

Time allocation:

Task Time allocated
Proposal 1 week
Deliverable 1 2.5 weeks
Deliverable 2 5 weeks
Deliverable 3 2.5 weeks
Report and presentation 3 weeks

Key Deliverables:

  • Software
    • Implement a hierarchical attention model that translates English documents into Chinese with full attention mechanism
    • Implement a hierarchical attention model that translates English documents into Chinese with Big Bird attention mechanism
    • Implement a hierarchical attention model that translates Chinese documents into English with Big Bird attention mechanism
  • Report
    • CS 298 report -- detailing on the bidirectional translation capability of implemented hierarchical attention models and comparing them between [5]
    • CS 298 presentation -- presenting the translation results

Innovations and Challenges

  1. There is no English to Chinese machine translation models that utilizes hierarchical attention mechanism yet ([5] focused on ZH-EN and ES-EN translation with hierarchical attention models)
  2. Big bird attention mechanism has yet been applied to hierarchical attention models ([1][5][7] did not try Big bird attention mechanism)
  3. Big bird attention mechanism has not been tested on Chinese to English translation tasks (innovation 1 and 2 combined)

References:

[1] Yang, Zichao, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. "Hierarchical attention networks for document classification." in Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. 2016.

[2] Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. "Attention is all you need." in Advances in neural information processing systems 30. 2017.

[3] Tian, Liang, Derek F. Wong, Lidia S. Chao, Paulo Quaresma, Francisco Oliveira, and Lu Yi. "UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation," in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). 2014.

[4] Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. "Big bird: Transformers for longer sequences." in Advances in neural information processing systems 33. 2020

[5] Lesly Miculicich, Dhananjay Ram, Nikolaos Pappas, and James Henderson. "Document-level neural machine translation with hierarchical attention networks." in arXiv. 2018.

[6] Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. "Bleu: a method for automatic evaluation of machine translation." in Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 2002.

[7] Pappas Nikolaos, and Andrei Popescu-Belis. "Multilingual hierarchical attention networks for document classification." in arXiv. 2017.