Training Data Management (TDM) is where you prepare and manage the data used to train your models. In TDM, you review and annotate documents, build your training dataset, and improve model performance for your specific use case.
TDM for ORCA VLMs availability
In v42.3 and later, TDM allows you to prepare, annotate, and manage training data for specialized ORCA VLM models.
Prerequisites
Before you start, ensure the following requirements are met:
The ORCA base model is installed. Follow the steps in [v42.3] Installing ORCA VLMs to install and configure the base model in your instance.
A Semi-structured layout with fields is locked and associated with the flow that is using ORCA. Learn more in Creating Semi-structured Layouts.
A model definition exists for the layout. The model definition links the layout to the training configuration and enables model training. Learn more in [v42.3] Model Definitions.
Learn how to navigate TDM for ORCA VLMs and understand its key sections below.
Model details page
The sections below explain the key information shown in the Overview tab.
Pre-trained model
Before you train a specialized model, the Model summary card will display a pre-trained candidate. This model is the ORCA base model. To learn more, see [v42.3] ORCA (Optical Reasoning and Cognition Agent) VLMs.
Model summary card
The Model summary card provides insights into all models trained for a specific model definition. See the table below to learn about the displayed details:
Field | Description | Notes |
|---|---|---|
State | Shows the model’s state. |
|
Projected automation | Displays the performance of the model that’s currently live. | The predicted automation based on the desired target accuracy. The projection is derived from the model’s training data. The system automatically ensures that the same data is not used for both projections and training. |
Test Target Accuracy | The accuracy percentage used to calculate the projected automation. | Indicates the desired overall system accuracy. |
Trained | Date the model was trained. | |
Layout version | The layout version used for this model definition. | Always use the latest locked layout version. |
Training data card
The Training data card displays information about your dataset. See the table below for more information:
Field | Description | Notes |
|---|---|---|
Training status | Indicates the status of your model based on the training data. |
|
Total documents | The number of annotated documents for this model. |
Projected Automation
The Projected Automation chart displays the performance of the model that’s currently live.
Expand it by clicking the arrow (
).
The chart displays how the target accuracy affects the automation. The lower the accuracy, the higher the automation, and vice versa.
Margin of Error
The Margin of Error (MoE) indicates the allowable range of inaccuracy in the system's results. It shows you how much the output can differ from the true value while still being acceptable. A smaller margin of error means the system is more accurate.To learn more, see Accuracy and Automation.
Training data
The Training Data tab lists all training documents and allows you to annotate and manage them. The interactive demo below will show you how to use this tab:
VLM annotation experience
While the ORCA VLM works out of the box, you can improve its performance by training it on your specific data, using annotated documents. Learn how to annotate documents in the demo below:
VLM annotation
History
The History tab displays all models trained for that model definition. This tab allows you to deploy, undeploy, and reject your models. You can also see detailed information for each model.
Base model entry in the History page
This entry represents the default ORCA-powered extraction configuration for a given layout, before custom training.
Learn how to navigate the History tab from the interactive demo below:
Column | Description | Notes |
|---|---|---|
Name | Model name | |
State | Model state |
|
Compatibility | Compatibility of the most recently live model for this definition. |
Learn more about compatibility in our Model Compatibility Logic article. |
Layout version | The layout version for this model. | |
Source | Where the model was trained. |
|
During training, the system uses the ORCA base model and the annotated documents to learn patterns specific to your use case. The training process produces a candidate model, which can then be evaluated and deployed.
Training does not modify the base ORCA model.
Instead, it creates a model tailored to the layout and dataset used for training.
Training results
After training completes:
a candidate model appears in the Overview tab
the system calculates projected automation
the candidate model can be reviewed and deployed.
You can retrain the model by adding more annotated documents and running training again.
Deploying the candidate model
To start using the trained model:
Open the Overview tab.
Review the candidate model summary.
Click Deploy to promote the candidate model to Live.
After deployment, the new model will be used for document processing.
Next steps
Train a specialized model for your specific use case on top of the ORCA base model and evaluate it, by following the instructions in [v42.3] Training a Specialized Model