Equation OCR Tutorial Part 3: Making an OCR for Equations using OpenCV and Tesseract

Categories Computer Vision, Uncategorized

I’ll be doing a series on using OpenCV and Tesseract to take a scanned image of an equation and be able to read it in and graph it and give related data. I was surprised at how well the results turned out =)

I will be using versions OpenCV 2.4.2 and Tesseract OCR 3.02.02.

 I have also made two tutorials on installing Teseract and OpenCV for Vista x86 on Microsoft Visual Studio 2008 Express. However, you can go on the official sites for official documentation on installing the libraries on your system.

Parts

Equation OCR Part 1: Using contours to extract characters in OpenCV
Equation OCR Part 2: Training characters with Tesseract OCR
Equation OCR Part 3: Equation OCR

Tutorials

Installing OpenCV: http://blog.ayoungprogrammer.com/2012/10/tutorial-install-opencv-242-for-windows.html/

Installing Tesseract: http://blog.ayoungprogrammer.com/2012/11/tutorial-installing-tesseract-ocr-30202.html/

Official Links:

OpenCV : http://opencv.org/
Tesseract OCR: http://code.google.com/p/tesseract-ocr/

Overview:

The overall goal of the final program is to be able to convert the image of an equation into a text equation that we will be able to graph. We can break down this project into three parts, extracting characters from text, training for the OCR and recognition for converting images of equations into text.

Recognition

Recognition is easy once we have the training files we need for Tesseract. To initialize for our language and set recognition mode for characters:
tess_api.Init(“”, “mat”, tesseract::OEM_DEFAULT);
tess_api.SetPageSegMode(static_cast<tesseract::PageSegMode>(10));
After extracting all the characters we can use Tesseract on those single characters to get the recognized character. 
OpenCV uses a different data storage type from Tesseract but we can easily extract the raw data from a Mat to Tesseract. 
tess_api.TesseractRect( resizedPic .data, 1, resizedPic .step1(), 0, 0, resizedPic .cols, resizedPic .rows);
tess_api.SetImage(resizedPic .data,resizedPic.size().width,resizedPic .size().height,resizedPic .channels(),resizedPic .step1());
tess_api.Recognize(0);
const char* out=tess_api.GetUTF8Text();
In the output we should find a character for the recognized character. Since the characters have been sorted from left to right we can just append all these recognized characters into a string stream and output the final results.

Exponents

In a polynomial there are variables (x) , numbers brackets and exponents. The exponents can easily be found by checking if the bottom of a character reaches 2/3 of the way down to the bottom. If it doesn’t than it is probably superscript and we can put a ^ in front of the number to signify an exponent.The green line shows the 2/3 line to check. As you can see all the standard characters that are not exponents will go past the 2/3 line.

Wolfram

To send the equation to Wolfram Alpha I had to reverse the URL format they use which was quite simple. All URL’s begin with : “http://www.wolframalpha.com/input/?i=”. Numbers and letters map to themselves but other characters map to hexcodes:
if(eqn[i]==’+’)url<<“%2B”;
if(eqn[i]==’^’)url<<“%5E”;
if(eqn[i]==’=’)url<<“%3D”;
if(eqn[i]=='(‘)url<<“%28”;
if(eqn[i]==’)’)url<<“%29”;

Extensions

The program can be extended to work for other functions such as log, sin, cos, etc by doing some additional training for letters. It can also be extended to work for fraction bars although it takes some more work. You first look for any “bars” which are any shapes with width 3 times greater than length and you also check if there are shapes above and below the bar. When you do this, you want to take the longest bar first because you want to find the largest fraction first. Then you can recursively find fractions in the numerator and denominator of the fraction going from largest fraction to smallest fraction. Then you can just append to the string (numerator) / (denominator). However, there may be other terms that are not fractions to the left and right of the fraction and you will need to resort by x-coordinates.

Conclusion

In finishing this tutorial I hope you have learned how to use OCR and contours extraction as I certainly have. If you release any extensions of programs through my tutorials I hope you will credit me and also give me message. Thanks for reading!

Source code

18 Comments

  • Thiago Dalul
    May 27, 2013

    Hi.

    I have a picture with a person who wears shirt with several numbers.
    How to extract this numbers?

    Regards,
    Thiago

    • ayoungprogrammer
      May 27, 2013

      Can you provide some sample images?

  • Sachinthana Dassanayake
    June 2, 2013

    how can I use this for a android application development ? Can you please help me out ?

    • ayoungprogrammer
      June 2, 2013

      There Android versions of both libraries which you can use to be integrated.
      The Tesseract library for Android is called tess-two which is the same.

  • Craig R
    October 31, 2013

    The biggest hurdle to OCR for me is the training step since I just want to add OCR to my project without much effort. Assuming I only care about English fonts generally found in MS Word as an example then is there a fairly comprehensive set of already existing, freely available, training images I can simply use with my OCR library of choice? It would be great if someone could share images they've already perfected instead of each of us repeating the same time consuming task!

  • PSPboy
    March 9, 2014

    Hi, if anyone is reading this I need help!

    I installed OpenCV and tesseract on Windows, but I do not know what to do with the source codes.

    I pasted them into C++ files in Visual Studio, and also linked the required libraries for opencv to work with them. So what is the procedure for executing it?

    Do I run the C++ files one by one and then use the tesseract command afterwards?

    Please Help,
    Ankur Jain

    • ayoungprogrammer
      March 9, 2014

      OpenCV and Tessearct are libraries you can use once you have linked them in your project. To execute all you need to do is #include the appropriate headers and call whatever functions you need

    • PSPboy
      March 9, 2014

      Okay so I should make each pastebin source file as a separate class in my project with different names? And then in my main I #include those classes, and then run the methods from the end of those classes from main? Which methods are they?

      My goal is to test if I am able to translate an equation image to text in windows, and then will try it on my Mac, which I will use to make it for iOS.

      Thanks in advance
      Ankur Jain

    • ayoungprogrammer
      March 9, 2014

      Each part in the tutorial is its own program. The first program allows you to extract training data from your test images. The second program automatically creates a training file and you have to run some commands to create the training set for Tesseract. The last program uses the training set that you made in part 2 to scan images and graph the equation.

      You can try using just the third program (the 3rd part) and using the english training set that Tesseract provides and it works ok.

  • Sibo Donald
    March 28, 2014

    Hi Micheal, your massively helped me, however in my project i am detecting and recognizing speed limit digits. Is it possible for the tesseract OCR to recognize digits in the image provided in this link. (https://www.facebook.com/photo.php?fbid=626528827402751&set=a.557730527615915.1073741829.100001369175018&type=1) or do i have to first fill the corners with the white black color.

    • ayoungprogrammer
      March 28, 2014

      Yes you will have to fill in the corners but you can do this easily with OpenCV floodFill

  • OcrNewbie
    January 7, 2015

    Michael,
    Tried posting before, if its redundant I apologize. Came across your blog while I am trying to get a better extraction of text using tessaract. Great tutorial!.
    Here's the link to my question and would appreicate any direction you can provide me.

    https://groups.google.com/forum/#!topic/tesseract-ocr/AcguXNGznJs
    Thanks

  • Alex I
    September 29, 2015

    Do you have any tips on how to read text from a photo of a phone screen e.g. http://i.ebayimg.com/00/s/MTIwOVg2ODQ=/z/PuMAAOSwKIpWBCXl/$_57.JPG ? I tried to use http://www.fmwconcepts.com/imagemagick/textcleaner/ textcleaner -g -f 25 -s 1 scr.jpg scr.jpg; tesseract scr.jpg o -auto-orient. The result is quite good, but do you think it might get better?

    • ayoungprogrammer
      September 29, 2015

      If you're using OCR to read phone information I would highly recommend making an app to get the data since it is much easier. You could try extracting the phone screen to isolate the screen and then running OCR on that. Good luck!

  • roandgzmn
    August 21, 2016

    Hi! Me and my group are working on our thesis. And we decided to develop an Algebraic Equation Solver using OCR. Do you mind if I ask about how to detect the exponents? Thank you in advance 🙂

    • ayoungprogrammer
      August 29, 2016

      Hi roandgzmn, as explained the blog post, the exponents are detected by checking if they are positioned above the centre horizontal line of the equation. However, this will only work with equations that are flat and on a straight line (i.e., fractions will not work).

Leave a Reply

Your email address will not be published. Required fields are marked *