TDM for ORCA VLMs

Training Data Management (TDM) is where you prepare and manage the data used to train your models. In TDM, you review and annotate documents, build your training dataset, and improve model performance for your specific use case.

In this article, you’ll learn how TDM allows you to prepare, annotate, and manage training data for specialized ORCA VLM models.

Prerequisites

Before you start, ensure the following requirements are met:

The ORCA base model is installed. Follow the steps in Installing ORCA VLMs to install and configure the base model in your instance.
A Semi-structured layout with fields is locked and associated with the flow that is using ORCA. Learn more in Creating Semi-structured Layouts.
A model definition exists for the layout. The model definition links the layout to the training configuration and enables model training. Learn more in Model Definitions.

Learn how to navigate TDM for ORCA VLMs and understand its key sections below.

Model details page

The Model details page is where you manage and evaluate your model. It includes three main tabs:

Overview
Training Data
History

Overview

The Overview tab provides a summary of the deployed model, the candidate model, the health of the training data, and projected automation.

See how to use the Overview tab in the interactive demo below:

The sections below explain the key information shown in the Overview tab.

Pre-trained model
Before you train a specialized model, the Model summary card will display a pre-trained candidate. This model is the ORCA base model. To learn more, see ORCA (Optical Reasoning and Cognition Agent) VLMs.

Model summary card

The Model summary card provides insights into all models trained for a specific model definition. See the table below to learn about the displayed details:

Field	Description	Notes
State	Shows the model’s state.	A Live model is a deployed model. A Candidate model is an undeployed model. It allows you to review model performance and decide whether the model should be deployed or retrained with additional data. An Inactive model is a candidate or an undeployed model.
Projected automation	Displays the performance of the model that’s currently live.	The predicted automation based on the desired target accuracy. The projection is derived from the model’s training data. The system automatically ensures that the same data is not used for both projections and training.
Test Target Accuracy	The accuracy percentage used to calculate the projected automation.	Indicates the desired overall system accuracy.
Trained	Date the model was trained.
Layout version	The layout version used for this model definition.	Always use the latest locked layout version.

Training data card

The Training data card displays information about your dataset. See the table below for more information:

Field	Description	Notes
Training status	Indicates the status of your model based on the training data.	Ready to train Reqs not met To meet the requirements for training a specialized model on top of a base model, you must: Install the base model. Learn how to do this in Installing ORCA VLMs. Annotate at least 120 documents to start training. Learn how to train a specialized model in our Training a Specialized Model article. Training failed — the model training process failed. Training in progress — model is currently training.
Total documents	The number of annotated documents for this model.

Projected Automation

The Projected Automation chart displays the performance of the currently live model, compared to the candidate one.

Expand it by clicking the arrow ().

The chart displays how the target accuracy affects the automation. The lower the accuracy, the higher the automation, and vice versa.

Margin of Error
The Margin of Error (MoE) indicates the allowable range of inaccuracy in the system's results. It shows you how much the output can differ from the true value while still being acceptable. A smaller margin of error means the system is more accurate.To learn more, see Accuracy and Automation.

Training Data

The Training Data tab lists all training documents and allows you to annotate and manage them. The interactive demo below will show you how to use this tab:

Tagging documents
You can organize and manage the training documents more efficiently in TDM for ORCA VLMs by adding tags. This feature allows you to add, filter, import, and export tags for documents, making it easier to categorize and find the information you need.
It provides the following key capabilities:
Manual tagging — Hover over the Tags cell in the Training Data table to reveal a + button. Click it to open the drop-down list with all existing tags. From there, you can select an existing tag or create a new tag.
Import tags — If the training data contains tags, they will be automatically imported.
Special-character handling — Tags cannot contain “;” or spaces (spaces are replaced with underscores).
Unused tags — Unassigned tags are automatically deleted.

VLM annotation

VLM annotation experience

While the ORCA VLM works out of the box, you can improve its performance by training it on your specific data, using annotated documents. Learn how to annotate documents in the demo below:

Automatically transcribed fields
Some fields may be pre-populated during annotation. This behavior is intended to assist the keyer and reduce manual typing. The keyer should review each value and correct it if necessary to ensure it matches the text in the document exactly. The reviewed values are then used as the for model training

History

The History tab lists all models trained for the selected model definition. From this tab, you can deploy, undeploy, or reject models, and view detailed information for each one. Starting in v43, you can also rename specialized models.

Base model entry in the History page
This entry represents the default ORCA-powered extraction configuration for a given layout, before custom training.

Learn how to navigate the History tab from the interactive demo below:

Column	Description	Notes
Name	Model name
State	Model state	A Live model is a deployed model. A Candidate model is an undeployed model. It allows you to review model performance and decide whether the model should be deployed or retrained with additional data. An Archived model is a model that is no longer in use. An Inactive model is a candidate or an undeployed model.
Compatibility	Compatibility of the most recently live model for this definition.	Orange indicates the current application version. Green indicates one version ahead of the current application version. Blue indicates two versions ahead of the current application version. Learn more about compatibility in our Model Compatibility Logic article.
Layout version	The layout version for this model.
Source	Where the model was trained.	Internal — trained in the current instance. Upload — trained in another instance and uploaded to the current one. Base — ORCA base model

During training, the system uses the ORCA base model and the annotated documents to learn patterns specific to your use case. The training process produces a candidate model, which can then be evaluated and deployed.

Training does not modify the base ORCA model.
Instead, it creates a model tailored to the layout and dataset used for training. You can rename the specialized model, as shown in the walkthrough above.

Training results

After training completes:

a candidate model appears in the Overview tab
the system calculates projected automation
the candidate model can be reviewed and deployed.

You can retrain the model by adding more annotated documents and running training again.

Deploying the candidate model

To start using the trained model:

Open the Overview tab.
Review the candidate model summary.
Click Deploy from the Actions drop-down to promote the candidate model to Live.

Deploying a model from the History tab
You can also deploy your model from the History tab, as shown in the walkthrough above.

After deployment, the new model will be used for document processing.

Next steps

Train a specialized model for your specific use case on top of the ORCA base model and evaluate it, by following the instructions in Training a Specialized Model.

Documentation Index