Optical Character Recognition (OCR)

Quoting the corresponding Wikipedia article:

optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example: from a television broadcast).
Widely used as a form of data entry from printed paper data records – whether passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation – it is a common method of digitizing printed texts so that they can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as cognitive computing, machine translation, (extracted) text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.
Early versions needed to be trained with images of each character, and worked on one font at a time. Advanced systems capable of producing a high degree of recognition accuracy for most fonts are now common, and with support for a variety of digital image file format inputs.[2] Some systems are capable of reproducing formatted output that closely approximates the original page including images, columns, and other non-textual components.

OCR provides an invaluable support when it comes to making accessible documents that were not accessible before: it can be applied to convert image-PDF documents, as well as to digitize paper documents that could not be made accessible otherwise. It can be leveraged by blind and visually impaired people, as well as by institutions and content authors as a starting point to make their documents more accessible.

IMPORTANT: Please note that, as accurate as it can be, due to its nature OCR is not perfect. Artifacts, mistakes and extraneous characters can be introduced in the resulting document by the process. In addition to this, the quality of the end result is influenced by a number of factors, including the quality of the source image (e.g. resolution and brightness), its characteristics (e.g. text layout, color contrast), and the tools involved in the process.

Resources

Showing results 1 to 6, out of 6.

Mathpix

Vincenzo Rubano

· November 14, 2022

Mathpix is a cloud platform designed to author, edit and distribute scientific documents (e.g. journal articles, research papers), especially when they contain Math formulas; it provides a web service, a desktop application (available for Windows, Mac OS and Linux), and a mobile app (that supports both Android and iOS).

one minute reading

Tesseract User Manual

Vincenzo Rubano

· July 15, 2022

As the name implies, this is the User Manual for [Tesseract]({{z ref “tesseract.md” }}), an open source project that provides both an OCR engine (available as a library) and a command line tool that provides all the features a standalone OCR solution should provide.

one minute reading

Tesseract

Vincenzo Rubano

· July 14, 2022

Initially developed by Hewlett-Packard (HP), and after by Google, Tesseract is an open source project that provides two different (yet related) things: an OCR engine (called libtesseract), available as a framework; a command line program (called tesseract), that allows performing a complete OCR process leveraging the features provided by the “libtesseract” framework.

one minute reading

Add Live Text Interaction to Your App (WWDC 2022)

Vincenzo Rubano

· July 6, 2022

In this video from Apple World Wide Developer Conference (WWDC) 2022, you can learn how to bring Live Text support for still photos or paused video frames to your app.

one minute reading

Abbyy FineReader PDF

Vincenzo Rubano

· June 24, 2022

Introduced by their creators as “the smarter PDF solution”, Abbyy FineReader PDF is de-facto the leading solution with regards to digitizing documents by means of an OCR process, both scanned images and image-PDF files.

one minute reading

Infty Reader

Vincenzo Rubano

· June 21, 2022

Developed as part of the “Infty” project, InftyReader is a commercial desktop application available for Microsoft Windows that allows digitizing scientific documents (including mathematical formulas). More specifically, it can recognize text content and math expressions either from scanned images or PDF documents by performing an Optical Character Recognition (OCR) process with a neural network trained with this specific purpose in mind.

one minute reading