About Kubernetes
Kubernetes is an open-source system that gives you the ability to automate the deployment and management of applications. To support our clients who have Kubernetes instances, or clusters, we have made Hyperscience available as a Kubernetes application.
Kubernetes is known for being flexible and modular in its implementation, due in large part to the packaging of its applications. No Kubernetes application is deployed on its own. Rather, each application runs with a set of libraries and dependencies in a structure called a container. Shipping applications in this way allows them to run alongside other applications and in a variety of environments without compatibility concerns.
When an application runs, its code runs in its container, which resides in a pod. Pods generally house containers that work together and share resources, but they can also have a single container. A pod represents the smallest executable—or runnable—unit in a Kubernetes cluster. External traffic can only access pods if an ingress for that pod is created.
Through the automatic creation of new pods, Kubernetes supports the scaling of resources as demand increases. Similarly, if demand decreases, Kubernetes can automatically scale those resources back. This management of resources results in a management of costs for your organization.
Pods run on nodes, which can be physical servers or virtual machines. A cluster can have many nodes, and all nodes are controlled by a master node (or master). The master manages the state of the cluster. When you interact with your cluster—for example, when configuring the desired state of an application—the master processes your commands and routes them to the appropriate node, pod, or container.
A simplified view of the elements we’ve discussed appears below.
How the Hyperscience Platform uses Kubernetes
The Hyperscience Platform architecture is based on a workflow orchestration. Multiple task-processing units are needed to execute each of the steps/tasks inside a workflow and move the workflow execution forward. We call the task-processing units blocks. Hyperscience uses Kubernetes to orchestrate the blocks, effectively making Kubernetes a workload orchestrator. To provide this functionality, Hyperscience built the HyperOperator. The HyperOperator provides a bridge between the Hyperscience Platform and the Kubernetes-container orchestration to provide seamless workload orchestrations in our product. The provided reference architectures give a high-level overview of how this process works.
Hyperscience Platform deployment includes several steps:
The customer should provide information about the available infrastructure: container orchestration, file storage, database, and container image registry, and they should ensure that they meet the requirements described below.
The customer should copy all required container images inside an internal container registry accessible from the Kubernetes cluster. The Hyperscience CS team member should provide access to the Hyperscience public container-image repository.
Based on the input from the previous steps, the customer should populate the values.yaml used by the Helm chart and install it.
To help customers with steps 1 and 2, we built hsk8s (Hyperscience Kubernetes CLI). The tool simplifies the container-image streaming to internal container-image registries, and it eases the process of collecting a support bundle with diagnostic information when a support ticket is opened. hsk8s should be executed on a workstation with access to the Kubernetes cluster. The workstation should have Internet access, when the customer need to use gather application diagnostics data. More information can be found in the Kubernetes Troubleshooting and Tweaks.
To help with the initial deployment steps and future support of the Hyperscience Platform the customers are advised to:
have a workstation with both access to the Internet (https://cloudsmith.io and https://support.hyperscience.ai) and internal systems (Kubernetes cluster and container registry). This workstation could be used for the deployment and help with support cases later on.
keep their values.yaml file in a source control system, like Git. This file describes all deployment parameters for Hyperscience, and it needs to be available for future application upgrades and support.
Infrastructure Requirements
Customers need to install Hyperscience on an existing Kubernetes cluster that is already configured to their specifications. Customers should reserve a namespace that is dedicated to the Hyperscience application.
An SQL database and a file store are required for the Hyperscience application's backend. It is important to note that the Hyperscience application does not install the database and the file store. We recommend using database and file store services external to the Kubernetes cluster.
Supported infrastructure components
Below is a list of supported Kubernetes versions, files stores, databases and container image registry:
Container orchestration | File storage | Database | Container image registry |
---|---|---|---|
Kubernetes | AWS S3 | AWS RDS PostgreSQL 14.x, 15.x, 16.x | Docker Registry HTTP API V2 compatible (AWS ECR) |
Azure blob |
| Docker Registry HTTP API V2 compatible (ACR) | |
Google Cloud Storage | Google Cloud SQL for PostgreSQL 14.x, 15.x, 16.x | Docker Registry HTTP API V2 compatible (Google Artifact Registry) |
Kubernetes versions support calendar
Once a version of Kubernetes is not supported, Hyperscience may introduce breaking changes in the deployment tooling in later releases. It's important to note that Kubernetes version support calendar is not related to the Hyperscience Platform version support.
Kubernetes version | Hyperscience end of support |
---|---|
1.32 | April 2026 |
1.31 | December 2025 |
1.30 | August 2025 |
1.29 | April 2025 |
1.28 | December 2024 |
PodSecurityPolicy
We have certain security features in our application that require the involvement of a second user. In order to allow for this second user, the following capabilities are required. Some flavors of Kubernetes, like Rancher and OpenShift, block those by default, and they need to be whitelisted additionally in the PodSecurityPolicy.
requiredCapabilities:
- SETUID
- SETGID
fsGroup:
type: RunAsAny
allowPrivilegeEscalation: true
readOnlyRootFilesystem: false
runAsUser:
type: MustRunAsNonRoot
volumes:
- configMap
- emptyDir
- persistentVolumeClaim
- secret
Reference architectures
Diagrams of the different deployment components and how they interact with each other appear below. The specific services used will depend on the cloud provider, but the overall concept remains the same.
A separate namespace in the Kubernetes cluster is recommended for the Hyperscience Platform and all associated resources.
AWS reference diagram
GCP reference diagram
Nodes
We require at least 2 separate node groups for optimal processing-load distribution. Each node group may have one or more nodes attached to it. The node sizing will change based on your desired performance and individual workflow characteristics.
We use nodeSelector
pod affinity (see Kubernetes's Assigning Pods to Nodes) to isolate the trainer from the rest of the application for performance reasons. In order for this setup to work, you need to assign one node group as the "platform" group and the other one as the "trainer" group. See AWS's Organize Amazon EKS resources with tags and GCP's Create and manage cluster and node pool labels for more information on how to apply the below tags:
hs-component=platform
(platform node group only)hs-component=trainer
(trainer node group only)
Considering that such fine-grained control over nodes is necessary, we recommend choosing Kubernetes providers that offer this capability. For GKE, this means selecting a Standard cluster rather than an Autopilot one.
Minimum requirements:
Node type | Node vCPU number | Node RAM in GB | EC2 instance type | GCE instance type | Number of nodes |
---|---|---|---|---|---|
platform | 8 | 32 | m5.2xlarge | n4-standard-8 | 2 |
trainer | 16 | 64 | m5.4xlarge | n4-standard-16 | 1 |
Docker repositories
To store Hyperscience container images, four repositories in an internal registry are required, as illustrated in the diagram above. These repositories must be accessible to the cluster's nodes through IAM permissions, enabling them to pull the images and initiate deployments.
Database
The Hyperscience Platform requires the use of a SQL database to store key application data.
After a database instance has been created in your preferred cloud provider, you need to collect the following pieces of connection-related information, which are later used to set the proper Helm Chart values (see Helm Chart for more details):
DB server endpoint (including the port)
DB Username
DB Password
DB Name
File store
The Hyperscience Platform is designed to work seamlessly with object-storage services from major cloud providers, which is our recommended approach for managing file storage in Kubernetes environments. То learn more, see File Storage Overview. If needed, you can also configure the platform to use Kubernetes persistent volumes, as described in Kubernetes Troubleshooting and Tweaks. In either case, make sure that Hyperscience pods have read and write access to the file storage.
Service access control
If you are running Hyperscience on a cloud provider and plan to use object storage for file storage, you’ll need to set up the correct permissions to ensure our pods can access it. This process requires the creation of IAM roles and policies. For more information, see Helm Chart.