What is OCR?

Optical character recognition (OCR) is the process of extracting text from an image of a page. This page image is an electronic picture of text and maybe other elements such as headings and pictures. Page images can result from scanning a paper document or from opening an electronic image file. You may receive these files by e-mail or from a fax machine or from your own scanner.

These images do not have editable text characters; they have many tiny dots (pixels) that together form a picture of text. The OCR process examines the text image and creates computer-editable text from it, so you do not have to retype the text manually.

OCR takes image:Omnipage s What is OCR? and creates text: Omnipage s2 What is OCR?

During OCR, OmniPage uses the settings selected in the OmniPage Toolbox to determine the text flow on a page, and creates ordered zones around areas of a page to identify what will be recognized as text or retained as a graphic. After OCR, you can save the resulting text to a variety of word-processing, page layout, and spreadsheet applications.

The OCR Capabilities in OmniPage

In addition to text recognition, OmniPage can retain the following elements of a document during OCR.

Graphics

Photos, logos, and drawings are examples of graphics.

Text formatting

Font types, font sizes, and font styles (such as bold or italic) are examples of text character formatting. Spacing between paragraphs, indents, tabs, line spacing and alignment and examples of paragraph formatting.

Page formatting

Column structure, paragraph placement, table handling, and locations of graphics are examples of page formatting.

Text Editor views

Recognition results are placed in the Text Editor. This offers three views and allows you to define how much formatting you want to have displayed.

  • OmniPage only recognizes machine-printed characters such as laser-printed or typewritten text. However, it can retain handwritten text, such as a signature, as a graphic.

//

What is OCR?