[v42.3] TDM for ORCA VLMs

Training Data Management (TDM) is where you prepare and manage the data used to train your models. In TDM, you review and annotate documents, build your training dataset, and improve model performance for your specific use case.

TDM for ORCA VLMs availability
In v42.3 and later, TDM allows you to prepare, annotate, and manage training data for specialized ORCA VLM models.

Prerequisites

Before you start, ensure the following requirements are met:

The ORCA base model is installed. Follow the steps in [v42.3] Installing ORCA VLMs to install and configure the base model in your instance.
A Semi-structured layout with fields is locked and associated with the flow that is using ORCA. Learn more in Creating Semi-structured Layouts.
A model definition exists for the layout. The model definition links the layout to the training configuration and enables model training. Learn more in [v42.3] Model Definitions.

Learn how to navigate TDM for ORCA VLMs and understand its key sections below.

Model details page

The Model details page is where you manage and evaluate your model. It includes three main tabs:

Overview
Training Data
History

Overview

The Overview tab provides a summary of the deployed model, the candidate model, the health of the training data, and projected automation.

See how to use the Overview tab in the interactive demo below:

The sections below explain the key information shown in the Overview tab.

Pre-trained model
Before you train a specialized model, the Model summary card will display a pre-trained candidate. This model is the ORCA base model. To learn more, see [v42.3] ORCA (Optical Reasoning and Cognition Agent) VLMs.

Model summary card

The Model summary card provides insights into all models trained for a specific model definition. See the table below to learn about the displayed details:

Field	Description	Notes
State	Shows the model’s state.	A Live model is a deployed model. A Candidate model is an undeployed model. It allows you to review model performance and decide whether the model should be deployed or retrained with additional data. An Inactive model is a candidate or an undeployed model.
Projected automation	Displays the performance of the model that’s currently live.	The predicted automation based on the desired target accuracy. The projection is derived from the model’s training data. The system automatically ensures that the same data is not used for both projections and training.
Test Target Accuracy	The accuracy percentage used to calculate the projected automation.	Indicates the desired overall system accuracy.
Trained	Date the model was trained.
Layout version	The layout version used for this model definition.	Always use the latest locked layout version.

Training data card

The Training data card displays information about your dataset. See the table below for more information:

Field	Description	Notes
Training status	Indicates the status of your model based on the training data.	Ready to train Reqs not met To meet the requirements for training a specialized model on top of a base model, you must: Install the base model. Learn how to do this in [v42.3] Installing ORCA VLMs. Annotate at least 120 documents to start training. Learn how to train a specialized model in our [v42.3] Training a Specialized Model article. Training failed — the model training process failed. Training in progress — model is currently training.
Total documents	The number of annotated documents for this model.

Projected Automation

The Projected Automation chart displays the performance of the currently live model, compared to the candidate one.

Expand it by clicking the arrow ().

The chart displays how the target accuracy affects the automation. The lower the accuracy, the higher the automation, and vice versa.

Margin of Error
The Margin of Error (MoE) indicates the allowable range of inaccuracy in the system's results. It shows you how much the output can differ from the true value while still being acceptable. A smaller margin of error means the system is more accurate.To learn more, see Accuracy and Automation.

Training data

The Training Data tab lists all training documents and allows you to annotate and manage them. The interactive demo below will show you how to use this tab:

VLM annotation experience

While the ORCA VLM works out of the box, you can improve its performance by training it on your specific data, using annotated documents. Learn how to annotate documents in the demo below:

Automatically transcribed fields
Field values are automatically transcribed during annotation to reduce manual typing. Review each value and correct it as needed to ensure it matches the document text exactly. The reviewed values are then used for model training.

VLM annotation

History

The History tab displays all models trained for that model definition. This tab allows you to deploy, undeploy, and reject your models. You can also see detailed information for each model.

Base model entry in the History page
This entry represents the default ORCA-powered extraction configuration for a given layout, before custom training.

Learn how to navigate the History tab from the interactive demo below:

Column	Description	Notes
Name	Model name
State	Model state	A Live model is a deployed model. A Candidate model is an undeployed model. It allows you to review model performance and decide whether the model should be deployed or retrained with additional data. An Archived model is a model that is no longer in use. An Inactive model is a candidate or an undeployed model.
Compatibility	Compatibility of the most recently live model for this definition.	Orange indicates the current application version. Green indicates one version ahead of the current application version. Blue indicates two versions ahead of the current application version. Learn more about compatibility in our Model Compatibility Logic article.
Layout version	The layout version for this model.
Source	Where the model was trained.	Internal — trained in the current instance. Upload — trained in another instance and uploaded to the current one. Base — ORCA base model

During training, the system uses the ORCA base model and the annotated documents to learn patterns specific to your use case. The training process produces a candidate model, which can then be evaluated and deployed.

Training does not modify the base ORCA model.
Instead, it creates a model tailored to the layout and dataset used for training.

Training results

After training completes:

a candidate model appears in the Overview tab
the system calculates projected automation
the candidate model can be reviewed and deployed.

You can retrain the model by adding more annotated documents and running training again.

Deploying the candidate model

To start using the trained model:

Open the Overview tab.
Review the candidate model summary.
Click Deploy from the Actions drop-down to promote the candidate model to Live.

Deploying a model from the History tab
You can also deploy your model from the History tab, as shown in the walkthrough above.

After deployment, the new model will be used for document processing.

Next steps

Train a specialized model for your specific use case on top of the ORCA base model and evaluate it, by following the instructions in [v42.3] Training a Specialized Model

Documentation Index