PII Data Deletion

Prev Next

If your submissions contain personally identifiable information (PII), you may want to delete it from submissions for security reasons, or your organization may have regulations that require you to do so. In any case, Hyperscience can automatically delete PII from your submissions at the time you specify. If you enable PII data deletion, all document image data, including original uploaded images and any processed or corrected images, will be deleted.

If PII data deletion is enabled, the automatic deletion of PII from submissions also includes the deletion of any flow runs used to process those submissions.

Settings

You can manage PII data deletion in your instance by configuring the following settings, which are available in the General section of the application settings (Administration > System Settings). 

PII data deletion

The PII data deletion setting is disabled by default. When you enable it, you will also need to set your PII data deletion policy. 

PII data deletion policy

You can choose to delete PII from submissions a certain number of days after either their completion dates (Submission Complete Date) or their submitted dates (Date Submitted). You can also specify the time of day that the system will delete PII.

If you choose Date Submitted as the basis for your PII deletion window, you should make your window long enough to ensure that you are only deleting PII from processed submissions. Otherwise, outstanding manual tasks or processing submissions may be deleted. 

Note that if you choose Date Submitted as the basis for your PII deletion window, the PII data deletion policy will also delete all training documents that are uploaded through the Classification tab of the Model Details page. If you want to exclude these training documents from the PII deletion policy, you need to:

  1. Go to /admin/common/freeformconfig/.

  2. Click the Edit config link.

  3. Enable the Retain Pii For Nlc Training Data setting.

  4. Click Save Changes.

Any changes you make to your PII deletion policy will be applied retroactively to all qualifying submissions in the systems. PII may not be deleted immediately after saving your settings, as PII deletion takes place once each day at the time you specify.

Submission record deletion

If you are deleting PII data, you can also choose to delete submissions that have had their PII removed and have already been used in Transcription Automation Training, if enabled.

To automatically delete these submissions, enable the Submission record deletion setting. When you enable this setting, you will also need to set your submission record deletion policy. 

Submission record deletion policy

You can specify how long after PII data deletion or Transcription Automation Training to delete submissions. The system will add the number of days you enter to either the PII data deletion window or the Transcription Automation Training window, whichever is greater. You have separate Transcription Automation Training windows for Structured and Semi-structured document transcriptions. A Transcription Automation Training window is the value in the Period of records to use settings, which can be found in the Structured Document Transcription and Semi-Structured Document Transcription sections of your flow’s settings.

Examples

Let's say your PII data deletion window is 60 days after the submission completion date and your Transcription Automation Training window for Structured documents is 30 days. If you enter a window of 10 days in your submission record deletion policy, the system will delete submissions 70 days after the Submission completion date (60 + 10 days).

In another case, say your PII data deletion window is 30 days after the submission completion date and your Transcription Automation Training window for Semi-structured documents is 50 days. If you enter a window of 10 days in your submission record deletion policy, the system will delete submissions 60 days after the Submission completion date (50 + 10 days).

You can also choose to delete submissions at a certain time of day and for the duration you specify.

Any changes you make to your submission record deletion settings will be applied retroactively to all qualifying submissions in the system. Submissions may not be deleted immediately after saving your settings, as submission deletion takes place once each day at the time you specify.

Implications for upgrades

If you are planning to upgrade to a new version of Hyperscience, you should consider how your PII deletion policy may impact automation after upgrading.

Normally, if PII is wiped, or deleted, from a document, the document is still used in finetuning training with all of its remaining information. When you upgrade to a new instance, the machine needs to recalibrate all previous QA records with the new model.

Note that the number of records available will depend on the Period of records to use settings for Transcription Automation Training and, potentially, the number of days specified in the PII data deletion policy.

Specifically, the number of records we can use is as follows:

  • For new trainers — all non-PII-wiped records within the training period (Period of records to use)

  • For existing trainers — all records within the training period

You can set different values for the Period of records to use setting for Structured and Semi-structured text.

If the records were wiped of PII, recalibrating with them isn’t possible, so you cannot use those records after upgrading. If most of the viable QA records were wiped of PII, there may be a significant loss of records used for that training. This loss of data may result in a reduction in automation after upgrading.

To avoid this loss of automation, attach the updated trainer to your application, and check the result from finetuning with the new trainer. By comparing the trainers’ data, you can check expected automation rates before and after an upgrade, along with the difference in viable QA records between versions. Knowing that information can help you determine if it’s safe to do an upgrade or if you need to process many QA records before or after the upgrade to fill the database with viable records.

You may find it helpful to adjust the Period of records to use and PII data deletion policy settings. You can find Period of records to use in your flow’s Structured Document Transcription and Semi-Structured Document Transcription settings. The PII data deletion policy setting can be found in Administration > System Settings. The default value for the Period of records to use settings is 30 days, and the default period in the PII data deletion policy is 7 days. To make more records available to your new trainer, increase either or both of these values.

Implications for reporting

When you delete PII from submissions and retain the submissions after PII deletion, those submissions will still be included in the calculation of reporting statistics. However, if you enable Submission record deletion, the deleted submissions will not be included in reporting statistics after their deletion. Therefore, you may notice a drop in some reporting metrics after enabling Submission record deletion. Specifically, submission deletion will affect the following reports:

  • Reporting > Accuracy

    • System Transcription Sampled Errors

    • Field Exception Report

  • Reporting > Processing Time

    • Submissions SLA

  • Reporting > User Performance

    • Transcription Sampled Errors