Tesseract

Posted by

Vincenzo Rubano
on · one minute reading.

Initially developed by Hewlett-Packard (HP), and after by Google, Tesseract is an open source project that provides two different (yet related) things:

  • an OCR engine (called libtesseract), available as a framework;
  • a command line program (called tesseract), that allows performing a complete OCR process leveraging the features provided by the “libtesseract” framework.

Being developed in C and C++, Tesseract can be used on many different platforms. Prebuilt packages for supported platforms are provided. Unsurprisingly, many bindings to use libtesseract with a wide variety of programming languages have been developed as well. Different graphical OCR solutions that use tesseract “under the hood” to perform the OCR process are available too.

Filed under: