Document Eligibility Filtering

Prev Next

Accessing this feature

Your access to the feature described in this article depends on your license package and pricing plan.

To learn which features are available to your organization and how to add more, contact your Hyperscience representative.

Document Eligibility Filtering shows whether a document can be used for training, based on internal system checks and machine learning criteria. It also explains why certain documents were excluded from the training set.

This feature helps you identify which documents are incompatible with training and the reasons behind it, so you can resolve any issues and improve overall model performance.

Using Document Eligibility Filtering

Before using Document Eligibility Filtering, make sure that you’ve: 

  • uploaded the number of required documents for training in Training Data Management and

  • analyzed the data.

  1. Go to Models > Identification tab and click on the name of your layout to access the Training Data Management tools.

  2. Find the model you want to manage in the respective tab and scroll to the Training Data Health card.

The Training Data Health card shows information about the quality of your training data. The bar indicates how many documents you need to meet the minimum required for training. 

All documents with the Training Status Ready to Annotate or Never appear as ineligible for training. To change this:

  • Annotate documents with status Ready to Annotate

  • Edit the documents with status Never to Always or Auto, depending on your use case.

  1. To see ineligibility details for the training set, click See Ineligibility details >> in the Field Identification Model or Table Identification Model card.
    The right-hand sidebar displays ineligibility details, including the reasons documents are excluded from training and the number of documents affected by each reason.

    Always make sure to reanalyze your data to see updated information on your training dataset.

    If documents have been added, removed, or modified since the last analysis, the ineligibility details may be outdated.

Based on the analysis results, a yellow indicator may appear on the left side of a document’s record in the Training Data card. Hover over the indicator to check whether the document contains an anomaly or is ineligible for training.

Expand the Filter section and use the Training Eligibility filter to select an ineligibility reason from the drop-down list. Click Apply Filters to view the results.

  1. To view ineligibility details for a particular document, click its ID in the Training Data table.
    Ineligibility information is displayed in the right-hand sidebar.

Ineligibility reasons

Reason

Description

Ineligible status

A document will always be ineligible for training if its Training Status is Ready to Annotate or Never.

Incompatible layout version

The information about the layout is incompatible with the documents provided for training.

Overlapping bounding boxes

If the annotated bounding boxes of two or more fields overlap, the document is ineligible for training.

Consecutive page breaks

The annotated bounding boxes for a field with multiple bounding boxes span across more than two consecutive pages.

Example: 

  • Page 1 has a field that continues on Page 2, and that field continues also on Page 3. 

  • Page 1 and Page 2 are two consecutive pages, but Pages 1 and 3 are not. Therefore, the span of the multiple bounding boxes makes the document ineligible for training. 

Max Pages per doc exceeded

The document has more than the maximum number of pages per document, as defined in the system. The default maximum is 100 pages per document. For more information about the default value, contact the Support team.

Max Segments per page exceeded

The document has more than the maximum number of text segments per page, as defined in the system. The default maximum is 2000.

Max total pages exceeded

The training set contains more than the maximum number of total pages, as defined in the system. The default maximum is 5000. Contact the Support team for more information and assistance. 

Max Segments exceeded

The training set contains more than the maximum number of total text segments, as defined in the system. The default maximum is 20,000,000. For more information, contact the Support team.

Unexpected Multiple Occurrences

Keyers can annotate multiple occurrences in Supervision and QA, even if the Multiple Occurrences checkbox is NOT selected in the Layout Editor. 

In Training Data Management, these documents appear as ineligible for training after Training Data Analysis.