Text does not get recognized properly
Try these solutions if any part of the original document is not converted to text properly during OCR:
-
Look at the page image and ensure that all text areas are enclosed by text zones. If an area is not enclosed by a zone, it is generally ignored during OCR.
-
Make sure text zones are identified correctly. Reidentify zone types and contents, if necessary, and perform OCR on the document again. See Zone types and contents for more information.
-
Be sure you do not have an unsuitable template loaded by mistake. If zone borders cut through text, recognition is impaired.
-
Adjust the brightness and contrast sliders in the Scanner panel of the Options dialog box. You may need to experiment with different settings combinations to get the desired results.
-
Enhance images for OCR purposes using the SET tools.
-
Check the resolution of the original image. Hover the cursor over a page thumbnail for a popup display. If the resolution is significantly above or below 300 dpi, recognition is likely to suffer.
-
Make sure the correct document languages are selected in the OCR panel of the Options dialog box. Only languages included in the document should be selected. In particular, setting an Asian language for non-Asian texts (and vice versa) is likely to produce unusable results.
-
If you turned on the option ‘Detect single language automatically’, automatic analysis assigns one language to each incoming page; re-recognize manually multi-lingual pages or pages where a wrong language may have been assigned.
-
Recognition results in Japanese, Korean and Chinese can be viewed and saved only if your system has East Asian language support.
-
Turn IntelliTrain on and make some proofing corrections. This is most likely to help with stylized fonts or uniformly degraded documents. If IntelliTrain was running, try turning it off – on some types of degraded documents it may not be able to help.
-
Do some manual training, or edit existing training to remove unsuccessful training.
-
If you use True Page as the Text Editor formatting level or for export, recognized text is put into text boxes or frames. Some text may be hidden if a text box is too small. To view the text, place the cursor in the text box and use the arrow keys on your keyboard to scroll to the top, bottom, left, or right of the box.
-
Check the glass, mirrors, and lenses on your scanner for dust, smudges, or scratches. Clean if necessary.
-
OmniPage only recognizes machine printed-text characters such as typewritten or laser-printed text. It can handle dot-matrix characters, though accuracy may be lower on draft-quality texts. It cannot read handprint or handwriting. However, it can retain signatures or other handwritten text as a graphic.