About Form Data Extraction

Omnipage pro About Form Data Extraction

This topic relates only to OmniPage Professional.

Form data extraction lets you collect data from a set of filled forms for further processing in databases or spreadsheets. Form data extraction is done by a workflow step in OmniPage.

The layout and location of form elements is defined by a form template file. The forms to be processed must be filled by computer or similar machine and not handwritten. The output is a comma-separated value ( csv) file that can be opened as a table in a spreadsheet program. Each form element becomes a table column and the data from each form is presented in a single row. The form elements are typically fillable fields, check boxes and option buttons.

Form template files

The form template must be an active, non-image PDF form that correctly presents the form elements to be encountered in the forms to be processed. It can be either filled or unfilled. It can be a multi-page form and a page range can be specified to eliminate non-form pages such as any containing filling instructions. See below on how a page range is interpreted in different types of processing.

You can use OmniPage Professional or a PDF Editor such as Nuance PDF Converter Professional if you need to make a template file yourself. See About form creation, and save the form to the PDF file type.

Setting up the workflow

The Workflow Assistant must be used. Typically this contains three steps: A Load Files step, the Extract Form Data step and a saving step. See Extract Form Data in Workflow Assistant.

Processing filled PDF files

This includes all PDF flavors except PDF Image. The PDF files can be either static or active. In this case each form must be located in a separate PDF file. If a page range is chosen for the template, the same page range is applied to all PDF files being processed.

Processing filled forms saved as image files

This includes all image file formats supported by OmniPage and includes PDF Image.  In this case the files to be processed must contain the number of pages defined by the template file page range. For example, if the template page range is 2-4, each form to be processed must contain three pages.

Each form to be processed can be in a separate file, but it is also possible process forms in single multi-page image files. In the above example, pages 1-3, 4-6, 7-9, etc. will be regarded as separate forms.

Processing filled scanned forms

Filled forms are best scanned using an ADF. Scan only the filled form pages that are specified in the form template (including its page range, if any). When scanning such a pile of multi-page forms, separator sheets are not needed.

//

About Form Data Extraction