[v42.3] Model Definitions

Prev Next

This feature is available in v42.3 and later.

Building an effective document-processing solution requires understanding how the components involved in model training work together. The Model Definitions table is designd to help manage these components. Model definitions separate configuration from the underlying models, providing a structured way to manage a model’s lifecycle. As training progresses, new model versions are created incrementally using updated training data within the same definition.

Model definitions availability

Beginning in v42.3, model definitions are available for VLMs, and the same framework will be used for other model types in future releases. To learn more about ORCA VLMs, see our [v42.3] ORCA (Optical Reasoning and Cognition Agent) VLMs article.

In this article, you will learn how to:

  • Understand what each row in the model definitions table represents.

  • Create a model definition.

Understanding model definitions

A model definition is the configuration layer that represents a specific combination of scope (what the model operates on), task (what the model does), and model type (the model architecture used). Each row in the table acts as the control layer for models used in a specific task and scope and serves as the container that manages all models trained for that configuration. This approach allows you to retrain, evaluate, and deploy new model versions while maintaining a stable reference for the system.

Each row in the model definitions table represents a model definition associated with a specific layout, and each column displays key information about the model’s configuration, training status, deployment state, and version compatibility.

Column

Description

Notes and examples

Scope

The data or objects the model operates on (for example, a layout or set of fields).

For example, ORCA VLMs extract fields from documents, such as invoices. In this case, the scope of ORCA VLMs is field processing.

Task

The type of problem the model is trained to solve.

For example, the task that ORCA VLMs are performing is field extraction — the task is to extract data (e.g., fields) from documents, based on the layout configuration.

Type

The model family used for this task and scope.

For example, VLM.

Compatibility

Compatibility of the most recently live model for this definition.

  • Orange indicates the current application version.

  • Green indicates one version ahead of the current application version.

  • Blue indicates two versions ahead of the current application version.

Learn more about compatibility in our Model Compatibility Logic article.

State

Shows whether the model is Live or Inactive.

The state is Live when the model is deployed.

The state is Inactive when the model is not deployed

Training status

The status of the current model training.

Training status could be:

  • Candidate trained — training completed successfully.

  • Candidate imported — model was imported from a different instance.

  • Training failed — the model-training process failed. You can view the details by clicking the info button ().

  • Training in progress — model is currently training.

  • Ready to train — model meets all requirements and is ready to be trained.

Date deployed

The timestamp of the last deployment for this model definition.

Displays the date and time when the model was last deployed.

Creating a model definition

This section explains how to create a model definition for ORCA VLM. Learn more about ORCA VLMs in our [v42.3] ORCA (Optical Reasoning and Cognition Agent) VLMs article.

Before you start, ensure that:

  • An ORCA base model is installed. Follow the process described in [v42.3] Installing ORCA VLMs to install and configure the ORCA base model.

  • The layout you select is Semi-structured and contains at least one field. ORCA VLMs cannot extract data from tables.

  • The latest version of the layout is locked.

  • The layout is not already linked to another ORCA VLM model definition.

Layouts and model definitions

For ORCA VLMs, each layout can be linked to only one model definition.

Use the interactive demo below to learn how to create a model definition:

Flows and live models

Each model definition maintains a model history, which contains all models trained or imported for that definition. A model definition can have multiple trained models in its history, but only one model can be Live at a time. The model definition acts as the source of truth for which model is currently used for document processing.

Flows are configured to use a model definition, not a specific model version. When a document is processed, the system automatically selects the model that is currently Live for that definition. If you deploy a new model version, the flow continues to work without any changes. Learn more about ORCA VLM flows in [v42.3] Installing ORCA VLMs.

Next steps

Because ORCA is delivered as a base model, it provides general-purpose extraction capabilities and is not adapted to your specific document types or business requirements. To optimize extraction performance for your use case, you should:

  • Annotate documents specific to your use case.

  • Train a model on top of the ORCA base model. Doing so allows ORCA VLM to learn patterns specific to your use case.

Base model

A foundational model that provides core general-purpose capabilities and is not directly trained on customer-specific examples. Use-case specialization is achieved through additional training on top of the base model using customer-specific data. Currently, Hyperscience uses the ORCA 1.0 base model.

Learn how to train a specialized model in [v42.3] Training a Specialized Model.