Languages and alphabets
The program can read over 120 languages with multiple alphabets: Latin, Greek, Cyrillic, Chinese, Japanese and Korean. See the list in the OCR panel of the Options dialog box. A listing is also provided on the Nuance web site.
This icon indicates a language with dictionary support. These are currently: Catalan, Czech, Danish, Dutch, English, Esperanto, Finnish, French, German, Greek, Hungarian, Italian, Norwegian, Polish, Portuguese, Russian, Slovenian, Spanish and Swedish. These dictionaries are used, along with user dictionaries and professional dictionaries, to assist in the recognition process and to provide suggestions during proofing.
For a list of the available professional dictionaries, and explanation of the options Detect single language automatically and Verify language choices, see the OCR panel of the Options dialog box.
Multiple-engine recognition is available for nearly all dictionary languages. Each running recognition engine’s dictionary is consulted during recognition and suggestions may be taken from any of them.
You can choose to have non-dictionary words underlined in the Text Editor. During proofing, you will see these words. Sometimes words will not be flagged as “non-dictionary” even if no dictionary contains them. This may happen if multiple recognition engines generate an identical result with high confidence, or if a “non-dictionary” word appears many times in a document.
If you make a multiple language selection, all characters needed for the selected languages are validated for recognition. You can also validate characters individually, to supplement those validated by your language choice.
If you select more than one language with dictionary support, all dictionaries involved are consulted, so you may get suggestions in more than one language.
Dictionaries, proofing and training are not available for Japanese, Korean or Chinese and these languages should not be combined with any others. See Asian language recognition.
This alphabet is used for most of the supported languages. When you choose one or more languages for recognition, all the necessary accented letters are validated as acceptable OCR solutions.
The Greek alphabet is used for the Greek language. OmniPage supports recognition of characters needed for reading Ancient Greek. This is what Classical Greek text looks like:
This is what Modern Greek looks like:
Here are the supported characters:
When reading Greek, the letters of the English alphabet can still be recognized. You can read, edit and proof Greek texts even if your computer has no Greek font files or Code Page support. But Greek support is needed to handle the exported text correctly.
The following languages are written with the Cyrillic alphabet: Russian, Bulgarian, Byelorussian, Chechen, Kabardian, Macedonian, Moldavian, Serbian and Ukrainian.
Russian text looks like this:
When reading Cyrillic languages, the letters of the English alphabet can still be recognized. Sometimes words are written with letters from the English alphabet in the middle of Cyrillic texts. OmniPage can handle them.
You can read, edit and proof Cyrillic texts even if your computer has no Cyrillic font files or Code Page support. But Cyrillic support is needed to handle the exported text correctly.
The following table shows which Cyrillic characters are supported. Not all of these characters are validated for Russian or any other single language.
Asian language support (Japanese, Chinese, Korean) is detailed in a separate topic.