GCS Listener

Prev Next

This feature is available in v41.2 and later.

The GCS (Google Cloud Storage) Listener allows you to ingest files from a specified GCS URI.

Contents of submissions

The connector accepts both single files and prefixes with files as submissions. Whether a file is processed as its own submission or as part of a larger submission depends on where it is located relative to the source URI:

  • If it is directly under the source URI, the system processes it as an individual submission.

  • If it is under a prefix under the source URI, the system considers it part of a larger submission, which consists of all files directly under the prefix.

Only one level of nesting is recognized by the connector when creating submissions with multiple documents. If there are other prefixes contained within the prefix you specify, files under those prefixes are ignored.

Metadata

For each submission, the GCS Listener can accept a JSON file that contains metadata, case data, and an external_id. The name of this file depends on whether the submission consists of a single file or a set of files:

  • filename.ext.json for individual files, where filename is the name of the file and ext is the file's extension

  • prefixname.json for files under a GCS bucket prefix, where prefixname is the name of a prefix under the source URI.

The metadata files must be located directly under the source URI. If you put those under a prefix of the source URI, they will be ignored, and a submission won't be created.

An example metadata file for a GCS Listener submission appears below.

{
   "metadata": {
        "test": "Metadata for file in GCS bucket"
   },
   "cases": [{
        "external_case_id": "900",
        "filenames": ["div_lic_1.jpg", "div_lic_2.jpg"]
   }],
   "external_id": "123"
}

Archiving processed files

As files are ingested, they are copied to an archive URI and deleted from the source URI. The names of files do not change when they are archived and include the files’ prefixes, if any.

If the deletion of files from the source URI leaves empty prefixes, those prefixes will not be deleted and will remain in the bucket.

Sample use cases

  • Another system places documents in a GCS bucket on a regular basis. I want to ingest those files one by one.

  • I want to regularly scan and ingest certain types of files under a certain prefix in a GCS bucket.

Block settings table

In addition to the settings outlined below, you can also configure the settings described in Universal Integration Block Settings.

Name

Required?

Description

GCS Source URI

Yes

The location the connector will scan for blob files. Can contain a prefix and trailing slash.
Example formats:

  • gs://<bucket_name>/<prefix>/

  • gs://<bucket_name>/<prefix>

  • <bucket_name>/<prefix>

GCS Archive URI

Yes

The location the connector will move files to after they have been ingested into Hyperscience.

Must be different form GCS Source URI

Example formats:

  • gs://<bucket_name>/<prefix>/

  • gs://<bucket_name>/<prefix>

  • <bucket_name>/<prefix>

File Extensions

Yes

A list of the extensions that image files will need to have to be eligible for processing.

If there are file extensions that you want to support but do not see in the drop-down list, select other, and enter the extensions in Other File Extensions.

Other File Extensions

No

A comma-separated list of file extensions that do not appear in File Extensions.

This field only appears if other is selected in File Extensions.

Include Submission Level Parameters

No

Indicates whether the system will ingest JSON files along with document files and submission Azure Blob prefixes. These JSON files can contain information such as metadata, case data, and external_id values. These JSON file names should match the names of the related files or GCS prefixes (e.g., XYZ.jpg.json for XYZ.jpg).

GCP Credentials JSON

Yes

Must be a valid service account key in JSON format.

To edit the JSON, click Edit value, modify the JSON, and then click Done.

Poll Interval (In Seconds)

No

The frequency at which the connector will monitor the source URI for submissions.

Defaults to 10.

Warm-Up Interval (In Seconds)

No

The length of time that a file must remain unmodified before it is eligible for processing.

When uploading to a prefix, make sure all the files within it are for the same submission. Otherwise, one prefix with many files within it may be split into two or more submissions, depending on the length of the warm-up interval.

Defaults to 15.

Setting up the GCS Listener

To set up the GCS Listener, enter the settings as described in the Block settings table above.

Before deploying a flow with the GCS Listener enabled, ensure that the credentials you’ve specified in the block settings have the following permissions assigned:

  • Read Blob and Put Blob for both the source and the archive URIs

  • List Bucket and Delete Blob for the source URI

To test if the permissions have been properly set, click Test Connection at the bottom of the connector settings in Flow Studio. If the required permissions are present, no errors will be detected.