View on GitHub

Teaching Machines to Code

Neural Markup Generation with Visual Attention


Abstract

We present a neural transducer model with visual attention that learns to generate LATEX markup of a real-world math formula given its image. Applying sequence modeling and transduction techniques that have been very successful across modalities such as natural language, image, handwriting, speech and audio; we construct an image-to-markup model that learns to produce syntactically and semantically correct LATEX markup code over 150 words long. We achieve a BLEU score of 89%, a new state-of-art for the Im2Latex problem. We also demonstrate with heat-map visualization how attention helps in interpreting the model and can pinpoint (localize) symbols on the image accurately despite having been trained without any bounding box data.

Paper

Slides, meetup preso


Test Results

Two model variations are presented:


Datasets

Formula Lists

Text file with normalized LaTeX formulas. Get this if you want to generate your own dataset starting with normalized formulas. You would copy this file into file the folder named ‘step0’ (see the preprocessing notebooks for details).

  1. I2L-140K Normalized Formula List (7MB Download)
  2. Im2latex-90k Normalized Formula List (4MB Download)

I2L-140K dataset is recommended over Im2latex-90k since it is the larger of the two (and a superset of Im2latex-90K), and yields better generalization.

Full Dataset

All preprocessed data: i.e. all images as well as DataFrames and numpy arrays produced at the end of the data processing pipeline.

  1. I2L-140K (822MB Download)
  2. Im2latex-90k (542MB Download)

Model Weights

Model checkpoints of the two GPU version can be downloaded here. You will need to extract the weights if you are training on a different number of GPUs. Test-run args and hyperparameters are in args.pkl and hyper.pkl files and also printed to traning.log.

  1. I2L-NOPOOL (651 MB)
  2. I2L-STRIPS (755 MB)

Hardware

We used two NVIDIA GeForce 1080Ti Graphics Cards