CS298 Proposal

Image to LaTeX via Neural Networks

Avinash More (avinash.more@sjsu.edu)

Advisor: Dr. Chris Pollett

Committee Members: Dr. Robert Chun, Mr. Varun Soundararajan

Abstract:

Many research papers in mathematics, computer science, and physics are written in LaTeX format. While writing technical papers or articles, there are some scenarios where the text to be written is a mathematical equation. Writing a mathematical equation in LaTeX format takes a lot more time compared to writing the same equation on a paper. The time-consuming approach of converting the equation written on paper to LaTeX format can be automated and optimized.

This project will develop a tool which will take an image of a mathematical equation as an input and will attempt to output the corresponding mathematical equation in the LaTeX form.

CS 297 Results:

Analyzed LaTeX representation of various types of mathematical equations
Explored machine learning library TensorFlow
Explored and finalized the data generation approaches
Decided the architecture of the possible neural network to train on the generated mathematical equation images

Schedule:

Week 1, 2: Feb. 6 - Feb. 20	Deliverable 1: Predict the first character of equations
Week 3, 4, 5: Feb. 21 - Mar. 13	Deliverable 2: Predict LaTeX for simple mathematical equations involving +, -, exponents
Week 6, 7: Mar. 14 - Mar. 27	Deliverable 3: Predict LaTeX for complex mathematical equations
Week 8, 9, 10: Mar. 28 - Apr. 17	Deliverable 4: Improve performance of the model
Week 7: Apr 18 - May. 1	Deliverable 5: CS 298 report and presentation.

Key Deliverables:

Software:
- A python program to read images and process those using CNN to get the best possible representation of it for our problem
- A python program to predict LaTeX representation of recognized mathematical equations in images using LSTM if CNN results are not good enough
- Above mentioned python programs will be configurable and will be programmed using a popular machine learning library TensorFlow.
Experiments:
- Tune the hyperparameters such as learning rate for stochastic gradient descent, regularization methods and number of unfoldings in LSTM to get the optimum results.
- Experiment with different types of model configurations to get the most optimum configuration of the model. Different model configurations can have different number of layers and each layer can also have different number of filters.
Report:
- CS 298 Report
- Presentation

Innovations and Challenges

This problem has been attempted only a few times as per the recent publications. As per Image-to-Markup Generation with Coarse-to-Fine Attention- PDF few Harvard professors were able to get the accuracy of around 75 percent using one of the solution approaches. This substantiates the challenging problem we are trying to solve.
Unlike normal text image where the size of the various characters is more or less same irrespective of the position of the character, characters in mathematical equations have different sizes for the same character depending on the position of the character. Because of such representation of mathematical equations, this problem becomes more challenging.
This problem has also been posted on Elon Musk's AI startup OpenAI under problems to research.

References:

Deng, Yuntian, et al. "Image-to-Markup Generation with Coarse-to-Fine Attention." International Conference on Machine Learning. 2017.
Deng, Yuntian, Anssi Kanervisto, and Alexander M. Rush. "What You Get Is What You See: A Visual Markup Decompiler." arXiv preprint arXiv:1609.04938 (2016).
Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
Sun, Chen, et al. "Revisiting unreasonable effectiveness of data in deep learning era." arXiv preprint arXiv:1707.02968 1 (2017).