CS298 Proposal

Image-based Localization of User-Interfaces

Riti Gupta (riti.gupta@sjsu.edu)

Advisor: Dr. Chris Pollett

Committee Members: Dr. Fabio Di Troia, Dr. Robert Chun

Abstract:

There is an increasing need to make web data available in all the languages so that people all over the world can understand it. Most web data is still available in English only. Web data can be available in various formats, it can be text, images, books and sound. The aim of the research project is to study the translation of web interfaces screenshots from English to Hindi. A lot of work has been done to translate web data from one language to another to make it globally available but the context of the image is not taken into consideration. In this research, the context of the image would also be taken into account for text translation checking if the accuracy of translation improves further. This can be extended to other languages apart from Hindi and English in the future.

Deliverables:

Design

Design a neural network taking UI with English text screenshots as input and translating them into Hindi. The model would be a Convolutional Neural Network (CNN) containing layers to avoid overfitting and extract relevant features using various filters.

Software

Python based machine learning model to localize UI with English text and translate them into Hindi.
Implement algorithm to determine accuracy of the model developed by computing the pixel mismatch of the translated UI screenshot.

Report

CS 298 report.
CS 298 presentation.

Innovations and Challenges:

Generating the dataset is a challenge as corresponding images in Hindi and English are not easily available on websites. Some pages are not crawlable which alleviates the issue.
Developing an architecture that at least gives the accuracy as given by current translation software.
Taking the context of image into consideration while translating.

Schedule:

Aug 28-Sept 10	Collect dataset containing UI with English text screenshots and corresponding screenshots with Hindi text
Sept 11-Sept 17	Propose a CNN model for translation of UI screenshots from English to Hindi
Sept 18-Oct 1	Implement the model proposed and validate on the dataset collected by comparing the pixel of translated screenshot.
Oct 2-Oct 15	Improve the accuracy by performing necessary transformations by tuning the parameters.
Oct 16-Nov 12	Write CS298 report and prepare slides

Literature References:

[1] S. Saini and V. Sahula, "A Survey of Machine Translation Techniques and Systems for Indian Languages," in IEEE Int. Conf. on Comp. Int. & Comm. Tech., 2015.
[2] H.A. Driss, S. ELFKIHI and A. Jilbab, "Features Extraction for Text Detection and Localization," in 5 th Int. Symp. On I/IV Comm. And Mobile Network, 2010.
[3] C.M. Thillou and B. Gosselin, Natural Scene Understanding, https://www.tcts.fpms.ac.be/publications/regpapers/2007/VS_cmtbg2007.pdf
[4] X. Zhou, et al., "EAST: An Efficient and Accurate Scene Text Detector," 1704.03155v2 [cs.CV] 10 Jul 2017.
[5] E. Charniak, Introduction to Deep Learning, ISBN: 9780262039512192 pp. | 7 in x 9 in75 b&willus. January 2019.
[6] O. Rippel and L. Bourdev, "Real-Time Adaptive Image Compression," The 34th Int. Conf. on Mach. Learn., 2017. doi: arXiv:1705.05823v1.
[7] G. Toderici et al., "Full Resolution Image Compression with Recurrent Neural Networks," arXiv e- prints.,2016. doi: arXiv:1608.05148.
[8] T. Law, H. Itoh and H. Seki, "A neural-network assisted Japanese-English machine translation system," in Proceedings of 1993 Int. Conf. on Neural Networks.
[9] Md. M. Hossain, K.E.U Ahmed and A.R Uddin, "English to Bangla Translation in Structural Way Using Neural Networks," in 2009 Int. Conf. on Information and Multimedia Tech.