Retraining Existing Models

Prev Next

Using features for Semi-structured documents

This article mentions features used in the processing of Semi-structured documents. Your access to those features depends on your license package and pricing plan.

To learn which features are available to your organization and how to add more, contact your Hyperscience representative.

When you adjust a semi-structured layout, some of the changes require retraining your Identification Models, while others do not.

Retraining ensures that the model can correctly recognize new or updated fields and tables, and helps prevent unexpected behavior in production.

This article explains which types of layout changes require retraining and which ones can be applied without it. Understanding this distinction helps you save time and maintain model accuracy.

When retraining is required

Field Identification Models

Retraining is required if:

  • A new field is added.

  • The Multiline setting for an existing field is toggled. Learn more in Creating Semi-structured Layouts.

  • The Multiple Occurrences setting is toggled.

Retraining required

Adding a new field without retraining will not make it functional.

You must retrain the model with documents that include annotations for the new or updated field. Until retrained, the new field will remain unsupported. You’ll see a message in Training Data Management when new fields are missing from the training data:

To learn more about the annotation process, see our Training a Semi-structured Model and TDM for Identification Models articles.

Table Identification Models

Retraining is required if:

  • A new column has been added.

  • The Multiline setting for an existing column is toggled.

Retraining required

Adding a new column without retraining is not enough. The model must be retrained with annotated documents that contain the new or updated column. You’ll see a message in Training Data Management when new columns are missing from the training data:

Learn more about the annotation process in Training a Semi-structured Model and TDM for Identification Models.

When retraining is not required

Updating any of the following settings in an existing field does not require retraining your Field or Table ID model:

  • Output name

  • Transcription Supervision

  • Identification Supervision

  • Required

  • Not in English

Аlways verify layout-version compatibility when switching between model versions

The Live version of a model always uses the most recent layout version, regardless of which layout version it was originally trained with.

This pairing can lead to unexpected behavior, especially if changes were made to the layout after training (e.g. new fields, field-setting updates).

Example: If a model is trained on v3 of layout, and v4 of that layout is created after the training, the model will use v4 of the layout when deployed.

Field ID models

No retraining needed if:

  • A field is removed

  • A field’s data type is changed. To learn more, see Data Types.

Table ID models

A Table ID model does not need to be retrained if:

  • A column is removed

  • A column’s data type is changed

Additional Considerations

If you retrain with existing training data only, new fields/columns or ones with updated Multiline setting will not be included. Retraining must be done with enough annotated examples covering the new or modified layout elements. To learn more about the annotation process, see Training a Semi-structured Model.