Improving accuracy

These hints are designed to increase OCR accuracy in OmniPage.

Select settings that improve accuracy in the Options dialog box.

Choose Options in the Tools menu or click Omnipage tb st options Improving accuracy in the Standard toolbar. Then click the tab in the Options dialog box for the settings you want to change:

  • Select Accuracy under Optimize the OCR process for…in the OCR panel.

  • Adjust the Brightness and Contrast sliders in the Scanner panel. Click here for an example of optimum brightness.

  • Enhance images for OCR purposes using the SET tools.

  • If your only criteria is OCR accuracy, prefer black-and-white scanning for good quality documents with crisp black text on a white background. Choose grayscale scanning if you are scanning pages with text on colored or shaded backgrounds, or for degraded documents with low or varied contrast.

  • Select Training File in the Proofing panel to use a character training file to help recognize special or stylized characters during OCR. See Training files for more information. This does not apply to Asian languages.

Use suitable recognition aids

  • If you have a long document, and no suitable training file, do some training on a few typical pages. Turn on IntelliTrain in the Proofing panel of the Options dialog box, then recognize three or four pages and proofread the text. Inspect the quality of the training in the Edit Training dialog box, then save it to file.

  • If you are getting poor results with a training file loaded, check its contents in the Edit Training dialog box. Make sure it is appropriate for the current document. If it is not, either unload it or edit its contents to remove training from poorly formed character shapes. Unsuitable training can yield worse results than no training at all.

  • If proofing is skipping too many unsuitable words and you have a user dictionary loaded, check its contents with the Edit User Dictionary dialog box. Delete any entries added in error, especially misspelt words.

Identify Zones Correctly

  • When processing pages manually, make sure zones are identified correctly before OCR.

  • When processing automatically, be sure your original layout setting is the best one for the document. Inspect the recognition results. If there are defects due to poor zoning on some pages, change the zone properties and/or locations and re-recognize those pages.

  • Make sure you do not have a zone template file loaded which is unsuitable for your current pages.

  • To retain handwritten text, such as a signature, identify it as a graphic zone.

Use High-Quality Images

  • In general, try to use original pages when you are scanning documents. Typeset, high-quality printed page images yield the best OCR accuracy. OCR accuracy may not be as good with lesser-quality pages.

  • With low-quality originals, sometimes a good-quality photocopy can yield better OCR results. This may be true on documents with low contrast or printed on thin paper. On the other hand, poor-quality photocopies with stripes, blotches or uneven brightness will usually give worse results.

  • Ask senders to select Fine or Best Mode when they send you a fax.

  • Page images should be free of notes, lines, or doodles. Anything that is not a printed character slows recognition, and any character distorted by a mark may be unrecognizable. Try not to include such marks in zones, or enclose them in an ignore zone.

  • Text in page images should be reasonably clean and crisp. Characters should be separated from each other and not blotched together or overlapping.

  • If you have influence over the styling used in documents you want to recognize, avoid having underlines used. It is difficult to recognize underlined text because the underline changes the shape of descenders on the letters q, g, y, p, and j.

  • If you are getting poor results from image files, check their quality and resolution by hovering the cursor over the thumbnails. The ideal resolution for OCR is 300 dpi. Images with less than 200 dpi or more than 400 dpi are liable to yield far lower accuracy. If you have the documents on paper, scan them again with better settings. If not, ask the people who supply your images to use 300 dpi.

//

Improving accuracy