Exploring Data Generation Approaches

This deliverable involves exploring data generation approaches for creating the equations from LaTex.

Our project involves generating correct LaTex for an image of a LaTex representation.

To do so, we need to train our neural network model on a big dataset of images and corresponding LaTex equations.

For this deliverable, I tried two approaches more prominently.

  • The first approach was creating a portable network graphics (PNG) image file
  • Second approach was for generating portable document format (PDF) file using postscript

The approach of generating PDF uses PNG as base file and embeds it with more wrappers and creates an equivalent PDF file. Therefore, using PNG file seems like more appropriate.

In the following resources, I am presenting an approach which can be used for data generation. In this approach, I am presenting the first way of data generating data using a popular python library MatplotLib.

Image generated is in png format. This image can be read into in python as a NumPy array.

Related Resource: