Options: OCR

This dialog box serves for selecting OCR settings. It can be displayed

  • in OmniPage,  

  • from Microsoft Office applications and WordPerfect using Direct OCR and

  • in PaperPort (if installed).

To display this dialog box in OmniPage

  • Open the Options dialog box with the Omnipage tb st options OCR Options button in the Standard toolbar or from the Tools menu.

To display this dialog box for Direct OCR (in Microsoft Office applications and Word Perfect)

  • Click the Omnipage icon acquire text settings OCR Acquire Text Settings button in an OmniPage toolbar (or use the File menu) in WordPerfect or in earlier Microsoft Office versions, or the Nuance OCR tab in a Microsoft Office 2007 or 2010 application, or from the File menu .

To display this dialog box in PaperPort

  1. Right-click the Microsoft Word icon in the PaperPort Send To bar and select Send To Options…. The Send To Options dialog box appears.

  2. Choose OmniPage 18 in the Convert image to text (OCR) with selection box, then click the Settings… button. The Options dialog appears with the OCR panel open.

The OCR panel offers the following settings:

Layout description

Use these settings to influence the auto-zoning process.

Automatic:  In most cases, Automatic is suitable, leaving the program to make all zoning decisions. Choose Automatic if your document contains pages with different layouts. Choose Automatic for a page with multiple columns and a table, and for any pages with more than one table.

Single Column, no Table: Choose this setting if your pages contain only one column of text and no table. Business letters or pages from a book are normally like this. Choose it also for a page with words or numbers arranged in columns if you do not want these placed in a table or decolumnized or treated as separate columns.

Multiple Columns, no Table: Choose this setting if some of your pages contain text in columns and you want this decolumnized or kept in separate columns.

Single Column with Table: Choose this setting if your page contains only one column of text and a table.

Spreadsheet: Choose this setting if your whole page consists of a table which you want to export to a spreadsheet program, or have treated as a table. No flowing text zones will be detected.

Form: Choose this setting if your pages contain forms.

Legal Pleading: Choose this setting for legal documents.

Custom (User defined):  Choose this to describe the layout of the pages in a document precisely. Then click the Custom Layout button to specify settings in the Custom Layout dialog box to influence text flow, table detection and graphics detection during auto-zoning.  

Template: Use this to have zoning performed by a template that you specify.

Optimize the OCR process for…

Click Speed to optimize recognition for speed. Using ‘Speed’ for good quality documents can still yield acceptable accuracy. Click Accuracy to optimize recognition for accuracy. When you choose ‘Speed’, advanced formatting such as colored texts and backgrounds or inverted text cannot be retained.

Languages and dictionaries

Languages in document

Select the language(s) that appear in the document you are going to process. These are the languages that OmniPage looks for during OCR. For faster and more robust recognition and more reliable proofing suggestions, select only the languages that are in the document.

The languages at the top of the list are your recent choices. Below that the languages are listed in alphabetical order. Type a letter to jump to it.

Omnipage icon dict OCR

This icon denotes a language with dictionary support. The dictionary is consulted to help in the OCR process, to offer suggestions during proofing and for automatic language detection.

Japanese, Korean and Chinese language settings call up a dedicated recognition engine. Only one of these languages should be selected at a time and not combined with any non-Asian language. Short embedded texts in English can be recognized without English being selected as a recognition language. See Asian language recognition.

Detect single language automatically

This is designed for unattended processing when the language of incoming documents cannot be determined in advance. When it is enabled, no other language choice is possible. It can work with all languages with dictionary support that use a latin-based alphabet plus four Asian languages.  Russian and Greek are excluded.

Three language groups are offered in the drop-down list below the checkmark:

  • Latin-alphabet languages (choose it to see the enabled languages)

  • Asian languages (Chinese Traditional, Chinese Simplified, Japanese and Korean).

  • Latin-alphabet and Asian (all of the above)

As pages arrive their texts are analyzed and a single validated recognition language is assigned to each page. When this option is enabled, the following option is not available.

Verify language choices

Selecting this checkmark  invokes automatic language detection that warns of differences between a detected language and the language setting. It works at page-level and identifies four categories: Japanese, Chinese, Korean and non-Asian. It cannot distinguish between Traditional and Simplified Chinese or between non-Asian languages. The last category means Japanese, Chinese or Korean characters were not detected. Verification takes place during image pre-processing, so the required recognition language must be set before image loading. Detection is more robust with at least se
veral lines of text and a minimum of embedded English text.

Professional dictionaries

Click the checkbox next to a dictionary name to select it. Choices are: Dutch Legal, Dutch Medical, English Legal, English Medical, English Financial (in OmniPage Professional only), French Legal and French Medical, German Legal, German Medical. To deselect a professional dictionary click it again.

User dictionary

Select a user dictionary if you wish. This is a personal dictionary to which you can add words. It supplements the program’s built-in dictionaries, both for assisting the OCR process and for making suggestions during proofreading. A user dictionary is useful to prevent the program suspecting proper names or specialist terms in your documents. You can create and save as many user dictionaries as you wish.

Click the button to the right of the selection box to create, edit, add or remove a user dictionary. Select [none] to unload a user dictionary.

You can browse to network or other locations to load and save user dictionary.

Any Microsoft Word user dictionaries detected on your system are also listed. A dictionary called Custom may appear – it is your default Word dictionary.

User dictionaries cannot be used with Asian recognition.

Fonts and characters

Font matching

Click the Font Matching… button to select which of the fonts on your system should be available for use in matching or representing the fonts in your documents. Font matching has no effect on Asian recognition; an Asian-enabled font is automatically set in these cases.

Additional characters

Enter accented letters here that you want validated for recognition in addition to those already validated by your language choice. Enter characters from your keyboard or from the character map.

Click Omnipage tb f charmap OCR to open the character map.

Reject character

Unrecognizable characters are represented in the Text Editor by a red reject character (a tilde: ~ by default) . For example, if OmniPage could not recognize the J in REJECT, and ~ is the reject character, the string RE~ECT would appear in your document.

Type the character you want to use in the Reject character edit box. Try to choose a character that will not appear in your documents.

Click Omnipage tb f charmap OCR to open the character map.

Character Map

Use it to copy and paste accented letters in the edit box. Grayed characters on the character map indicate that they are not enabled for recognition, although they can be inserted into the edit box. Right-click on an empty area below the character map and use the context menu to display or hide character sets. Asian characters are not supported.

//

OCR