Using features for Semi-structured documents
This article mentions features used in the processing of Semi-structured documents. Your access to those features depends on your license package and pricing plan.
To learn which features are available to your organization and how to add more, contact your Hyperscience representative.
When you adjust a semi-structured layout, some of the changes require retraining your Identification Models, while others do not.
Retraining ensures that the model can correctly recognize new or updated fields and tables, and helps prevent unexpected behavior in production.
This article explains which types of layout changes require retraining and which ones can be applied without it. Understanding this distinction helps you save time and maintain model accuracy.
To learn how to train an Identification model, see Training a Semi-structured model
Learn how to monitor and improve your models in our Monitoring Model Performance and Improving Model Performance articles.
When retraining is required
Field Identification Models
Retraining is required if:
A new field is added.
The Multiline setting for an existing field is toggled. Learn more in Creating Semi-structured Layouts.
The Multiple Occurrences setting is toggled.
Retraining required
Adding a new field without retraining will not make it functional.
You must retrain the model with documents that include annotations for the new or updated field. Until retrained, the new field will remain unsupported. You’ll see a message in Training Data Management when new fields are missing from the training data:
To learn more about the annotation process, see our Training a Semi-structured Model and TDM for Identification Models articles.
Table Identification Models
Retraining is required if:
A new column has been added.
The Multiline setting for an existing column is toggled.
Retraining required
Adding a new column without retraining is not enough. The model must be retrained with annotated documents that contain the new or updated column. You’ll see a message in Training Data Management when new columns are missing from the training data:
Learn more about the annotation process in Training a Semi-structured Model and TDM for Identification Models.
When retraining is not required
Updating any of the following settings in an existing field does not require retraining your Field or Table ID model:
Output name
Transcription Supervision
Identification Supervision
Required
Not in English
Аlways verify layout-version compatibility when switching between model versions
The Live version of a model always uses the most recent layout version, regardless of which layout version it was originally trained with.
This pairing can lead to unexpected behavior, especially if changes were made to the layout after training (e.g. new fields, field-setting updates).
Example: If a model is trained on v3 of layout, and v4 of that layout is created after the training, the model will use v4 of the layout when deployed.
Field ID models
No retraining needed if:
A field is removed
A field’s data type is changed. To learn more, see Data Types.
Table ID models
A Table ID model does not need to be retrained if:
A column is removed
A column’s data type is changed
Additional Considerations
If you retrain with existing training data only, new fields/columns or ones with updated Multiline setting will not be included. Retraining must be done with enough annotated examples covering the new or modified layout elements. To learn more about the annotation process, see Training a Semi-structured Model.