Development of a document recognition system using OpenCV and Tesseract - Custom Web Development Blog

Development of a document recognition system using OpenCV and Tesseract

The feature of input of user data taken from printed documents for automated enterprise solutions is currently in great demand.

Such documents include:

  • Official standard state documents, such as passport, personal insurance policy number (SNILS), driver’s license, birth certificate, etc.
  • Printed documents used in the company document flow made according to company templates.

Our company focuses on development of software based on machine learning, computer vision, image processing, and optical character recognition. In this article, we describe our experience in development of a textual template recognition system which includes an Android mobile app and a template control server.

The Android smartphone is used as the tool for digitization of textual data in this system. The reasons are:

  • It is widely used and available;
  • Photos of documents can be taken;
  • It has sufficient CPU performance for image pre-processing and text recognition.

The mobile app can have pre-installed templates of documents to be recognized included in the software distribution. Further, new templates can be made at synching the app with the company server. The templates are saved directly on the smartphone. No connection to the server is needed to use them.

Significantly, pre-processing of images and text recognition do not require network connection and are performed directly on the device. It also means that the images are not transmitted over the network, which can be an important requirement for confidentiality of user data.

The user data input process is as follows:

  • Templates are made using the template engine and pictures of document templates. They are then uploaded to the company server.
  • If necessary, the mobile app syncs the templates with the server. The templates are then stored on the device and no connection to the server is required.
  • The type of the document to be recognized is selected from the list predetermined by installed templates.
  • Transition to the Application Activity using the smartphone camera is performed.
  • The document image is positioned inside a frame on the screen, then a picture is taken. It is used to automatically find the margins of the document. Thereafter, their compliance with the template is checked, search for specified fields is performed, and the text is being recognized.

Open source libraries OpenCV and Tesseraсt are used for image pre-processing and text recognition. The libraries support various languages that can be included in the app.

Negative factors that impair recognition accuracy have been found during the development of the system:

  • Bad lighting;
  • Document lamination because flares appear in this case;
  • Background, watermarks, document protection elements;
  • Great displacement of the recognized fields from their place in the templates;
  • Overlapping with or closeness of the printed text to graphic/textual elements of document samples;
  • Deformation of the document, incorrect position of the document for a shot.

It is hard to solve some of these issues because of the versatility requirement for the system that can use newly created templates without changing the source code of the app. Recommendation messages will be sent to the user in order to improve the quality of the images and the accuracy of recognition.

The advantages of the system are:

  • Autonomy;
  • No special equipment (scanners, cameras, PCs) needed;
  • A variety of languages for recognition and their combinations;
  • Simple creation of document templates using a template engine;
  • The app works without changing the code when the user adds new templates;
  • Textual fields for recognition can be both horizontal and vertical.

In addition, you can add various means of processing the output textual data to the system:

  • Displaying data for editing;
  • Encryption and uploading data to the server;
  • Processing data with a decision-making system based on the recognition results.

Applications:

  • State institutions;
  • Banks, credit institutions;
  • CRM-systems;
  • Educational institutions.

Conclusion

So, computer vision with OpenCV and Tesseract provide great possibilities to enrich the functionality of enterprise automation systems. As a computer vision software development company, we know how to build such solutions.

Reach out to us if you need the assistance.