Flexible Extraction

Prev Next

Flexible Extraction is a type of task in Hyperscience that involves human intervention to validate or correct data extraction for Structured documents. This task is used when automatic extraction isn’t fully reliable, allowing you to transcribe or adjust specific fields to ensure accuracy. Once a document is matched to the correct layout, Flexible Extraction automatically extracts the relevant data based on the fields defined for that layout.

Flexible Extraction Supervision task

A Flexible Extraction task in Hyperscience is a Supervision task where a human reviewer validates or corrects data extracted from Structured documents.In Flexible Extraction tasks, keyers are asked to either transcribe the fields they see in a document or to validate the content of fields marked for review. Just as in Field Transcription tasks, each field may have data-type validation rules that prevent you from entering an incorrect type of data.

The system generates Flexible Extraction tasks in the following situations:

  • When a Structured page is initially marked as No Layout Found but is later manually matched to a layout, preventing it from going through the typical Transcription Supervision process. In these instances, no fields will be marked as required for review, and the keyer should transcribe whatever fields they see on the page.

  • When the rules in a Custom Code Block flag a document for further review by a keyer. The rules in the code may require that the entire document be reviewed or only certain fields. The system will mark the fields that the keyer should review, and the keyer won’t be able to complete the task until they’ve entered responses for all these fields.

If a Structured page was matched to the wrong layout before it was sent to Flexible Extraction, you will not be able to assign it to a different layout. However, you can indicate that it was matched to the wrong layout by clicking Mark Layout Variation Incorrect at the top of the right-hand sidebar.  

Learn how to navigate the interface and complete a Flexible Extraction task in the next sections of this article.

Flexible Extraction Interface

To open a Flexible Extraction task, go to the Tasks section and click Perform Tasks under the Supervision Task Type table.

Flexible Extraction allows you to:

  • Confirm that the data is accurately reflected.

  • Transcribe the fields that are missing or need to be reviewed.

Left Panel - Document Preview

In the left panel, you’ll see every page of the document sent to Flexible Extraction. Select a page to start reviewing and move through the document page by page.

Pages Order

Pages in this panel appear in the order they were classified during Document Classification.

  • Hide the panel for better page visibility by clicking the button.

Middle Panel - Page Preview

The middle panel shows all documents you’ve grouped.

In this panel, you can:

  • Preview each page, using the arrows.

  • Rotate the page by clicking the button.

  • Zoom In or Out for better visibility when transcribing fields.

Right Panel - Transcribe Fields

The right panel displays all fields available for transcription.

Field visibility depends on the layout and fields tied to each page. You can narrow down results by enabling these options:

  • Show fields from manually matched pages only - displays only the fields from the manually classified pages.

  • Show fields from selected variation only - see fields from the variation you’re currently working on. Doing so makes it faster and easier to focus on what matters, especially in complex documents with many variations within the layout group.

  • Click on each field to type in its value.

    • Add a new occurrence of a field as shown in the image below, or

    • Mark a field as illegible.

Transcribing fields

Data Types

Make sure each field in your layout has the correct data type, and transcribe exactly what you see. The system won’t accept characters that don’t match the defined data type. For more details, see Default Data Types.

  • Document Details section allows you to:

    • View the submission ID for the document and the current layout variation.

    • Mark Layout Incorrect - send the document for reprocessing if it was misclassified.

    • Reject Document - this will dismiss any further Supervision tasks for this document.

Once you’ve reviewed and transcribed all fields, click Complete Tasks.

Submission JSON Output

Once the task is completed, you can check the results in the Submissions Output Page. To learn more, see Submission Output Page.

Page Numbering in JSON Output

The submission’s JSON output now reflects the page order determined during Document Classification. It also ensures that only manually matched pages are included in Flexible Extraction.

Layout Page Number and Layout Variation Page Number

In Flexible Extraction, both layout_page_numberand layout_variation_page_number always return a value of 1. This is expected behavior.

To track the actual page order, use the document_page_number variable. This field correctly reflects the page sequence across both Manual Classification and Flexible Extraction.