Accuracy

Prev Next

Measuring the accuracy of trained models is crucial to the success of your use case. In this article, you will learn what accuracy is, about the different types of accuracy, and how accuracy is determined. 

What is accuracy?

Accuracy helps you understand how often the system correctly predicts values compared to the actual values that reached consensus during QA . It measures the effectiveness of your model based on the proportion of correct predictions out of all predictions made.

Accuracy may be impacted by factors like imbalanced datasets (insufficient examples of documents with similar visual layouts) or inconsistencies in annotations. That’s why it relies on Quality Assurance tasks (QA). When QA tasks are enabled, humans can provide feedback to the machine, allowing it to improve accuracy over time by understanding the true content of each piece of data. Learn more in our What is Quality Assurance? article.

Types of accuracy

Machine Accuracy

Machine accuracy indicates how accurately a specific model predicts the correct value for a given task. This metric varies depending on the type of task the model is performing: 

  • Classification — reflects the model’s capability to correctly predict the layout to which a page belongs.

  • Identification — assesses the model’s ability to predict the precise positions of tables or fields within a document.

  • VLM field extraction (ORCA VLMs) — indicates the correctness of the extracted transcription only.

  • Transcription —  measures the model’s ability to accurately transcribe text. 

Machine Accuracy is computed on confident predictions that have been sampled for QA and reached consensus for the correct value.

Manual Accuracy

Manual accuracy refers to the accuracy of a task that relies on human input. It involves assessing the correctness of human-generated decisions in comparison to the ground truth of your data. 

  • Classification — reflects data keyers’ percent of correct decisions for the layout to which a page belongs

  • Identification — assesses data keyers’ precision in determining the positions of tables or fields within a document

  • VLM field extraction — measures how accurately human reviewers validate or correct the extracted values compared to QA consensus. Manual Accuracy is based entirely on transcription.

  • Transcription — measures data keyers’ accuracy when transcribing text.

Manual Accuracy indicates both data keyers’ input from the submission and the QA task. 

How accuracy differs across model types

The way accuracy is calculated depends on how a model performs its task.

Identification and Transcription models

For Identification models, accuracy is determined by two components:

  • Identification — locating the field or table correctly.

  • Transcription — extracting the correct value.

To be considered accurate, both the location and the transcription must be correct.

ORCA VLMs

ORCA VLMs availability

Starting with v42.3, accuracy reporting is available for ORCA VLMs. Learn how to enable it in Installing ORCA VLMs.

For ORCA VLMs, accuracy is calculated differently. ORCA VLM performs end-to-end extraction and does not separate identification from transcription. As a result:

  • Accuracy for ORCA VLMs is based entirely on transcription.

  • There is no separate identification accuracy component.

This means ORCA VLM accuracy is simpler to interpret, as it reflects only the correctness of the extracted value.

Determining accuracy 

To determine the accuracy of the machine or a data keyer, the system requires QA consensus. Consensus means that two identified locations or transcriptions for a field or table must match. If the machine’s prediction or the data keyer’s value matches in QA, the value is considered accurate. Learn more about consensus in Scoring Field Identification Accuracy and Scoring Transcription Accuracy.

When accuracy increases, automation decreases. If you want better accuracy, more fields with high confidence will still need human checking. This is because the model requires a higher level of certainty before relying entirely on machine transcription.

Accuracy reports

You can find the accuracy data in the reports described below.

Machine Accuracy vs Manual Accuracy report

The chart comparing Manual Accuracy and Machine Accuracy shows metrics for both data keyers and the machine over a chosen period. It's important to note that the chart displays accuracy based on occurrences rather than individual fields.

Learn more about the report in Manual Accuracy vs Machine Accuracy.

Document Output Accuracy report

The Document Output Accuracy is determined by the final transcription of a specific field or cell extracted during submission processing, regardless of whether it was performed by a human or a machine. This report is focused on the correctness of the transcribed content. 

  • In Structured documents, this value represents the Transcription Accuracy of the output.

  • In Semi-structured documents:

    • If a field/cell is sampled for ID QA, and it's determined that the location differs from the one extracted during submission, it won't be included in the report.

    • However, if a field/cell is sampled for Transcription QA, and it's found that the transcription differs from the one extracted during submission, it will be included in the report.

For fields, the chart shows accuracy on an occurrence level.

Below the chart, you'll see the Field Accuracy percentage and the Table Cell Accuracy percentage, which reflect the average accuracy for the chosen date range. Learn more in Document Output Accuracy.

For ORCA VLMs, Document Output Accuracy reflects the correctness of the final extracted values only.

Because ORCA does not separate identification and transcription:

  • All accuracy calculations are based on the final transcription output.

  • There is no exclusion of fields based on incorrect field location.

These factors make Document Output Accuracy more straightforward to interpret for ORCA than for other types of models, as it directly represents the correctness of the extracted content.

Accuracy and automation tradeoff

For ORCA VLMs, increasing confidence thresholds improves accuracy but reduces automation, as more fields are routed to human supervision.