V42 is currently available to SaaS customers only and will be available to on-premise customers on a future date. For information, contact your Hyperscience representative.
42.0.2 (26 Sept 2025)
Versions 42.0.0 and 42.0.1 were not released and are not supported.
Highlights
A leader in intelligent document processing, Hyperscience strives to consistently add value, drive innovation, and improve the user experience with every new version of Hypercell. As such, we’re introducing the following key features in Hyperscience v42.
ORCA
ORCA General Prompting Block — The General Prompting Block allows for more flexibility in the application of ORCA VLMs, with prompt inputs now being possible outside of a layout interface and directly in flows. This block allows ORCA to work independently of Semi-structured layouts and creates more possibilities for ORCA prompts to power data-processing tasks or to act as a Model-in-the-Loop for a larger variety of use cases. As a result, you can leverage the ORCA General Prompting Block to power features like Document Chat, or you call it through an API for data manipulation.
Highlighting predicted field locations during ORCA Supervision — During ORCA Supervision tasks, the system now highlights the predicted locations of the document’s fields. A field's approximate location appears when it is selected in the interface's right-hand panel. By helping keyers find fields in documents, this update reduces the time required to complete VLM QA tasks.
Note that ORCA field locations differ from Field ID locations in that the suggested locations test to be larger and more dynamic than Field ID locations, with no explicit initial training from the user.
For more information about ORCA VLMs, see ORCA (Optical Reasoning and Cognition Agent) VLMs.
Structured Document Classification and Flexible Extraction
Placing unmatched pages in Structured documents — Previously, if you filled a gap in a machine-classified layout (e.g., a missing page 3), the unmatched page was always added to the end of the document, even if you attempted to append the page in a specific position. In v42, when you drag and drop an unmatched page into an empty page position (e.g., between page 2 and page 4), the system respects the manual override and places it exactly where you set it, without appending it to the end.
We’ve added the following options for placing pages in Structured documents:
Append page — We’ve introduced an Append page button that allows you to place the selected page at the end of the document. Note that if you choose to append it, the machine’s confidence will be reduced, and it will be sent for manual extraction.
Add to Doc — Add the selected page to the specific document you’re currently working on. Note that the classification will be treated as a manual match, and the content on the page will require manual extraction.
As part of this update, the submission's JSON output reflects the order of the pages determined during Document Classification. This output also ensures that only manually matched pages are sent to Flexible Extraction.
To learn more, see Structured Document Classification.
Displaying fields from manually matched pages only — During Flexible Extraction tasks, use the Show fields from manually matched pages only toggle to display only the fields from the manually classified pages.
For more information, see Flexible Extraction.
Major UI updates
To improve the user experience, we've updated the Hypercell user interface. The features described below include updates that change how certain actions are completed in Hypercell. We recommend notifying your team members of these changes or offering them updated training, as needed.
Models
Models are now visible in the main menu of the product — We’ve moved the Models section out of the Library, reducing the amount of time it takes to find the models you’re looking for. “Models” is now a standalone category in the main menu. You can manage your models directly through this section.
Before:
After:
Tasks
Tasks Overview is updated with a refreshed design — We’ve updated the layout of the Tasks Overview tab (Tasks > Overview) to remove graphs and the occurrence of side-by-side cards.
Before:
After:
Upgrade notes
These updates may impact your upgrade process or affect initial processing times after upgrading. For more information or assistance, contact your Hyperscience representative.
Operating systems
Supported Ubuntu versions — We've removed support for Ubuntu 20. For information on supported operating systems, see Infrastructure Requirements.
Databases
Supported PostgreSQL versions — We've removed support for PostgreSQL 14 and added support for PostgreSQL 17. To learn more about supported databases, see Infrastructure Requirements.
Python
Supported Python versions — Beginning in v42, Hyperscience does not support the use of Python 3.9 in flows, including Code Blocks and external Python packages. More information about supported Python versions, see Developing Flows.
Required disk space
Minimum disk space required for v42.x.x — Due to the growing sizes of packages and dependencies, each application VM running v42 requires a minimum of 300GB of available disk space (200GB in the root volume, 100GB in the /var
volume). This space is in addition to the disk space consumed by the operating system (i.e., the space needs to be available after starting a new VM).
To learn more about minimum infrastructure requirements, see Infrastructure Requirements.
Additional features and enhancements
User experience
Design updates — The following key components were updated with a refreshed design in this release:
Perform Tasks interface enhancements — On smaller screens (less than 1500px in width), the Supervision and QA task type tables on the Perform Tasks page (Tasks > Perform Tasks) are displayed one below the other.
Numbers are right-aligned for better readability.
Buttons now have a consistent width across both tables.
“File Name” column added to the Training Data table in Training Data Management (TDM) for all models — You can now view document names directly in the TDM interface to help identify and manage training data faster. Additionally, the original file name is searchable from the Training Data table in TDM.
Improved “Training Data Health” card in TDM for Classification — You’ll now see the number of eligible layouts in both the Training Data Health and the Summary cards. The Training Data Health card is expandable, so you can view all individual layouts when multiple are present.
Layouts are eligible for training when they meet the minimum number of pages required for training.Consistent file-upload dialog boxes across the platform — We’ve updated the file-upload dialog boxes across the platform to improve consistency.
Flows
Updated design of flows and blocks in Flow Studio — To improve the user experience, we’ve made the following enhancements to the Flow Studio user interface:
Information included in flow blocks — We’ve enhanced the design of flow blocks in Flow Studio to include more block-specific information, including:
the names of connections configured in Input Blocks and Output Blocks,
the number of blocks and branches in each Routing Block, and
configuration errors.
You can click on a block to reveal or hide its subflows or notifications.Visualization of branch merging after Routing Blocks in Flow Studio — When branches merge together in a flow after a Routing Block, the flow’s visualization in Flow Studio includes a Merge card to indicate where the merging occurs. Clicking a Merge card reveals or hides the merged branch.
By allowing you to reveal or hide parts of the flow, these changes reduce the amount of scrolling needed to view a flow’s contents and the effort required to know your relative location in the flow.
Reducing RAM used by block processes — In v42, the system does not start block processes automatically after a flow is deployed. Instead, these processes are started only if tasks have been scheduled for the blocks, reducing the total amount of RAM consumed by block processes.
As part of this update, we've added the HS_RUNNABLE_BLOCKS_DISCOVERY_POLICY
and HS_ACTIVE_BLOCK_CUTOFF_SECONDS
“.env” file variables, which allow you to specify the conditions under which block processes start and stop.
For more information, see Reducing RAM Used by Block Processes.
Maximum size of block inputs, block outputs, and workflow-engine payloads — To prevent out-of-memory errors, block inputs, block outputs, and workflow-engine payloads are now limited to 500MB. If this limit is exceeded, the data is offloaded to the file store.
Recursion in nested subflows — We've fixed a recursion-related issue that caused errors when processing submissions through flows containing a large number of nested subflows. As part of this fix, we've created the HYPERFLOW_ENGINE_MAX_SUBFLOWS_DEPTH_LIMIT
".env" variable, which has a default value of 100
.
To learn how to change the default limit, see Recursion in Nested Subflows.
New version of “On-error with included Submission data” subflow — To reduce the time required to retrieve submission data, we’ve created a new version (V2) of our “On-error with included Submission data” subflow. The first version of this flow continues to work in v42.
More details about on-error flows can be found in On-Error Flows.
Layouts
Uploading files of multiple types when creating Structured layouts — We've fixed an issue that allowed users to upload files of different types when creating a Structured layout. In previous versions, the application allowed users to upload files of different types, but it became unresponsive when they attempted to do so. In v42, if you try to upload multiple file types when creating a Structured layout, an error is shown, additional uploads are blocked, and the Next button is disabled.
Semi-structured Document Classification
Page limit for training Semi-structured Classification models — The default training limit for Semi-structured Classification models has been increased to 100,000 pages. This new maximum allows you to train models on larger datasets without additional configuration. If your use case requires training on more than 100,000 pages, reach out to your Hyperscience representative for assistance.
Structured Document Classification
Improved layout variation selection in Document Classification — We’ve added a Layout Variation drop-down list in the Document Classification task. You can search for a layout variation by name directly in the drop-down list, or you can scroll through the list to find the layout variation you’re looking for. This change makes assigning layouts and variations more efficient, especially when working with a large number of options.
Note that the drop-down list is visible only when you have variations in a Structured layout group that has already been selected, either by the machine or manually, before the Document Classification task.
Flexible Extraction
Filtering fields by variation — You can use the Show fields from selected variation only toggle to see fields from the variation you’re currently working on. Doing so makes it faster and easier to focus on what matters, especially in complex documents with many variations within the layout group.
To learn more about the options available during Flexible Extraction, see Flexible Extraction.
Full Page Transcription
Faster processing enabled by configuring Full Page Transcription Block — You can now configure Full Page Transcription Blocks to selectively process any combination of text, signatures, and checkboxes. For example, if a use case does not require the transcription of checkboxes, you can disable the transcription of fields of that data type. Unselected models are not run for page elements that are irrelevant to the submission, thus reducing submission-completion times.
ORCA (Optical Reasoning and Cognition Agent) VLMs
ORCA Composite Block — ORCA now supports composite block functionality in flows, allowing for simplified implementation of ORCA VLMs. The block removes duplicate calls and offers improved compatibility between block versions and across application versions. Additionally, the inclusion of this block in v42's Flows SDK reduces the complexity of incorporating ORCA VLMs into custom flows.
Custom Supervision
Updated version of the Custom Supervision Block — Existing flows will continue running as expected, while new flows will now use the updated CUSTOM_SUPERVISION_3
block by default. This update ensures better long-term stability and compatibility, with no expected changes to results or throughput. The new version introduces tighter validations around the Supervision template to improve stability, and it may not be fully backwards compatible. No changes to throughput or accuracy are expected.
“Show Custom Supervision Tasks as separate items” setting — The Show Custom Supervision Tasks as separate items setting in System Settings (Administration > System Settings) allows you to control how Custom Supervision tasks appear on the Perform Tasks page (Tasks > Perform Tasks).
When enabled:
Each Custom Supervision Task type is shown separately, but only if there are active tasks of that type.
The same applies if task- or flow-level restrictions prevent the user from accessing that task type.
When disabled (default behavior):
All Custom Supervision tasks are grouped under Custom Supervision on the Perform Tasks page.
Enabling this setting gives teams the flexibility to surface specific task types to keyers for better focus.
For more details on the Show Custom Supervision Tasks as separate items setting and other available settings, see Application Settings Overview.
Leveraging ORCA VLMs in Document Chat — With the introduction of the General Prompting Block mentioned in the Highlights section of these release notes, you can now use ORCA VLMs in Document Chat. This update makes it possible to use Document Chat in environments that cannot be connected to the internet or for use cases where the use of publicly available LLMs is not allowed.
Document Renderer Block
Reducing final PDF file size with colors in generated PDFs — You can now set an Output Image Mode in the Document Renderer Block to control how images appear in the block’s generated PDF. This reduces the final PDF file size, optimizing performance. You have three options to choose from:
Keep original colors
Convert to grayscale
Convert to black & white
Additionally, the existing Image Quality setting now applies only when Output Image Mode is set to Keep original colors or Convert to grayscale.
More information about the Document Renderer Block can be found in Flow Blocks.
Connections
Notifiers for Microsoft Azure Blob Storage and Google Cloud Storage (GCS) — With the addition of the Azure Blob and GCS Notifier Output Blocks, you can send submission data to the Azure blob or GCS bucket of your choosing.
Each Notifier can create a single JSON for all of a submission's processed documents, individual JSON files for each processed document, or individual JSON files for each document matched to a layout and a JSON file for each unmatched page. You can also choose whether to send all of a submission's data or only high-level data.
For more information about these Notifiers, see Azure Blob Notifier and GCS Notifier.
Reporting
User Performance reporting for Full Page Transcription QA — Metrics for Full Page Transcription QA have been added to the following reports:
Keyer Projection Report
KeyerPerformance.csv
Full Page Transcription QA Time Spent (Seconds)
Segments Reviewed in Full Page Transcription QA
Segments Reviewed in Full Page Transcription QA per Hour
Segment Characters Reviewed in Full Page Transcription QA
Segment Characters Reviewed in Full Page Transcription QA per Hour
HourlyReportingSubmissionOverview.csv
Users Performing Full Page Transcription QA
Time Spent in Full Page Transcription QA (Seconds)
Full Page Transcription QA Segments in Starting Work Queue
Full Page Transcription QA Segments Added to Work Queue
Full Page Transcription QA Segments Completed
Supervision Volume — Segments per day
More details on these reports can be found in Keyer Projection Report and Supervision Volume.
“Usage Report” is now “Usage Bundle” — We’ve renamed the Usage Report to “Usage Bundle” to better reflect the export’s contents.
Files included in the Usage Bundle — For customers using the automated usage-transmission option, the majority of the files in the manual download are now included in the nightly automated transmissions to Hyperscience.
For more information about transmitting usage data automatically, see Automatic Transmission of Usage Data.
Versions of flows included in the Usage Bundle — The Usage Bundle now includes data for only the most recent versions of flows, reducing the size of the bundle.
To learn more about the contents of the Usage Bundle, see Usage Bundle.
Authentication
Windows authentication for Microsoft SQL Server (MSSQL) — If your instance uses a MSSQL database, you can now use Windows authentication, Microsoft's recommended authentication method, instead of SQL Server authentication. Windows authentication leverages Linux's Kerberos Ticket Granting Ticket (TGT) and provides a higher level of security than SQL Server authentication. As a result, taking advantage of this feature can reduce potential compliance overhead and facilitate installations and upgrades in Microsoft ecosystems.
Infrastructure
Retrieving data from a replica database — You can now set up a replica database to retrieve reporting and audit-log data in on-premise instances. This configuration can improve system responsiveness and prevent timeouts when attempting to read these kinds of data, particularly in instances that process a high volume of submissions.
For more information on how to set up a replica database, see Retrieving Data From a Replica Database.
Django upgrade — We've updated the version of Django that our application uses from 4.2.23 to 5.2.1. Version 5.2 is the latest long-term support version of Django.
Submission Retrieval Store
Support for Google Cloud Storage (GCS) — You can now use Google Cloud Storage as a submission retrieval store. When connected to GCS, the system receives file URLs from the bucket you specify, which it then uses to download the files and process them as submissions. You can configure individual flows to ingest data from a particular bucket by editing the Submission Bootstrap settings in each flow.
More information on setting up GCS retrieval stores can be found in Flow Blocks.
API
New Flow Runs endpoint — We’ve updated our Flow Runs endpoint to allow you to retrieve either the entirety of a flow run’s data or a summary of that data.