[v42.3] TDM for ORCA VLMs

Prev Next

Training Data Management (TDM) is where you prepare and manage the data used to train your models. In TDM, you review and annotate documents, build your training dataset, and improve model performance for your specific use case.

TDM for ORCA VLMs availability

In v42.3 and later, TDM allows you to prepare, annotate, and manage training data for specialized ORCA VLM models.

Prerequisites

Before you start, ensure the following requirements are met:

  • The ORCA base model is installed. Follow the steps in [v42.3] Installing ORCA VLMs to install and configure the base model in your instance.

  • A Semi-structured layout with fields is locked and associated with the flow that is using ORCA. Learn more in Creating Semi-structured Layouts.

  • A model definition exists for the layout. The model definition links the layout to the training configuration and enables model training. Learn more in [v42.3] Model Definitions.

Learn how to navigate TDM for ORCA VLMs and understand its key sections below.

Model details page

The Model details page is where you manage and evaluate your model. It includes three main tabs:

  • Overview

  • Training Data

  • History

Overview

The Overview tab provides a summary of the deployed model, the candidate model, the health of the training data, and projected automation.

See how to use the Overview tab in the interactive demo below:

The sections below explain the key information shown in the Overview tab.

Pre-trained model

Before you train a specialized model, the Model summary card will display a pre-trained candidate. This model is the ORCA base model. To learn more, see [v42.3] ORCA (Optical Reasoning and Cognition Agent) VLMs.

Model summary card

The Model summary card provides insights into all models trained for a specific model definition. See the table below to learn about the displayed details:

Field

Description

Notes

State

Shows the model’s state.

  • A Live model is a deployed model.

  • A Candidate model is an undeployed model. It allows you to review model performance and decide whether the model should be deployed or retrained with additional data.

  • An Archived model is a model that is no longer in use.

  • An Inactive model is a candidate or an undeployed model.

Projected automation

Displays the performance of the model that’s currently live.

The predicted automation based on the desired target accuracy. The projection is derived from the model’s training data. The system automatically ensures that the same data is not used for both projections and training.

Test Target Accuracy

The accuracy percentage used to calculate the projected automation.

Indicates the desired overall system accuracy.

Trained

Date the model was trained.

Layout version

The layout version used for this model definition.

Always use the latest locked layout version.

Training data card

The Training data card displays information about your dataset. See the table below for more information:

Field

Description

Notes

Training status

Indicates the status of your model based on the training data.

  • Ready to train

  • Reqs not met

  • Training failed — the model training process failed.

  • Training in progress — model is currently training.

Total documents

The number of annotated documents for this model.

Projected Automation

The Projected Automation chart displays the performance of the model that’s currently live.

  • Expand it by clicking the arrow ().

The chart displays how the target accuracy affects the automation. The lower the accuracy, the higher the automation, and vice versa.

Margin of Error

The Margin of Error (MoE) indicates the allowable range of inaccuracy in the system's results. It shows you how much the output can differ from the true value while still being acceptable. A smaller margin of error means the system is more accurate.To learn more, see Accuracy and Automation.

Training data

The Training Data tab lists all training documents and allows you to annotate and manage them. The interactive demo below will show you how to use this tab:

VLM annotation experience

While the ORCA VLM works out of the box, you can improve its performance by training it on your specific data, using annotated documents. Learn how to annotate documents in the demo below:

Pre-populated fields

Some fields may be pre-populated during annotation. This behavior is intended to assist the keyer and reduce manual typing. The keyer should review each value and correct it if necessary to ensure it matches the text in the document exactly. The reviewed values are then used as the  for model training.

VLM annotation

History

The History tab displays all models trained for that model definition. This tab allows you to deploy, undeploy, and reject your models. You can also see detailed information for each model.

Base model entry in the History page

This entry represents the default ORCA-powered extraction configuration for a given layout, before custom training.

Learn how to navigate the History tab from the interactive demo below:

Column

Description

Notes

Name

Model name

State

Model state

  • A Live model is a deployed model.

  • A Candidate model is an undeployed model. It allows you to review model performance and decide whether the model should be deployed or retrained with additional data.

  • An Archived model is a model that is no longer in use.

  • An Inactive model is a candidate or an undeployed model.

Compatibility

Compatibility of the most recently live model for this definition.

  • Orange indicates the current application version.

  • Green indicates one version ahead of the current application version.

  • Blue indicates two versions ahead of the current application version.

Learn more about compatibility in our Model Compatibility Logic article.

Layout version

The layout version for this model.

Source

Where the model was trained.

  • Internal — trained in the current instance.

  • Upload — trained in another instance and uploaded to the current one.

  • Base — ORCA base model

During training, the system uses the ORCA base model and the annotated documents to learn patterns specific to your use case. The training process produces a candidate model, which can then be evaluated and deployed.

Training does not modify the base ORCA model.

Instead, it creates a model tailored to the layout and dataset used for training.

Training results

After training completes:

  • a candidate model appears in the Overview tab

  • the system calculates projected automation

  • the candidate model can be reviewed and deployed.

You can retrain the model by adding more annotated documents and running training again.

Deploying the candidate model

To start using the trained model:

  • Open the Overview tab.

  • Review the candidate model summary.

  • Click Deploy to promote the candidate model to Live.

  • After deployment, the new model will be used for document processing.

Next steps

Train a specialized model for your specific use case on top of the ORCA base model and evaluate it, by following the instructions in [v42.3] Training a Specialized Model