Enabling Application Machines with GPUs in On-Premise Kubernetes Deployments

With Hyperscience, you can leverage ORCA VLMs in the processing of your submissions. However, VLMs require more computing resources than our other automation capabilities do.

In order to use ORCA VLMs in on-premise deployments of Hyperscience, you need to have at least one application machine in your instance that has both a GPU (graphics processing unit) and a CPU (central processing unit). GPUs have specialized cores that allow the system to perform multiple computations in parallel, reducing the time required to complete the complex operations required to apply visual machine learning to your use case. When you add a correctly sized machine with a GPU to your instance, you can maximize the benefits of ORCA VLMs. To learn more about this feature, see “ORCA (Optical Reasoning and Cognition Agent) VLMs” ( v41 | v42 ).

This article describes how to enable an application machine with both a GPU and a CPU in an on-premise Kubernetes deployment of Hyperscience.

1. Verify that hardware requirements are met and install CUDA drivers.

In order for ORCA to take advantage of the GPU on the on-premise Kubernetes cluster, the Kubernetes nodes must be configured with correct drivers or AMIs, and the containers need to be able to access the GPU.

Bare metal

a. Make sure your GPU hardware meets the requirements.

See Kubernetes Installation Overview for more information.

b. Prepare for driver installation.

i. Verify that your GPU supports CUDA.

CUDA is a parallel computing platform and programming model created by NVIDIA. Machine learning often uses CUDA-based libraries, SDKs, and other tools.

You can find out whether your GPU supports CUDA by running the following command:

lspci | grep -i nvidia

For more information, see NVIDIA’s CUDA GPUs - Compute Capability and NVIDIA CUDA Installation Guide for Linux.

ii. Verify that you have a supported version of Linux.

Follow the instructions in NVIDIA’s NVIDIA CUDA Installation Guide for Linux to check your version of Linux. Then, make sure your Linux version is supported by the latest CUDA Toolkit by reviewing NVIDIA’s NVIDIA CUDA Toolkit Release Notes.

iii. Verify that the system has gcc installed.

The gcc compiler is required for development using the CUDA Toolkit. To make sure it is installed, follow the instructions in NVIDIA’s NVIDIA CUDA Installation Guide for Linux.

iv. Verify that the system has the current Kernel headers and development packages installed.

Kernel headers are header files that specify the interface between the Linux kernel and userspace libraries and programs. The CUDA driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 3.17.4-301, the 3.17.4-301 kernel headers and development packages must also be installed.

To verify that these requirements are met, run the following command:

apt-get install linux-headers-$(uname -r)

For more information and commands for various Linux distributions, see NVIDIA’s NVIDIA CUDA Installation Guide for Linux.

c. Install the CUDA drivers.

Follow these steps to make sure that you have the most current and correct CUDA drivers installed.

i. Remove any CUDA drivers installed on the system.

Compatibility between the CUDA Toolkit and CUDA drivers is crucial. In our Docker-based system, the CUDA Toolkit is installed in the trainer images, and you need to make sure that they match the CUDA drivers that are installed in the host OS. The current CUDA Toolkit version installed should be compatible with the latest available CUDA driver. For details on toolkit and driver versions, see NVIDIA’s NVIDIA CUDA Toolkit Release Notes.

An example command for removing these drivers appears below:

apt-get clean; apt-get update; apt-get purge -y cuda*; apt-get purge -y nvidia-*; apt-get -y autoremove

You can tailor this command to match your Linux distribution.

ii. Install nvidia-container-toolkit or nvidia-docker2.

The Docker host needs to be prepared before it can expose your GPU hardware to the containers. Although containers share your host’s kernel, they cannot access information on the system packages you have installed. A plain container will lack the device drivers that interface with your GPU.

(Ubuntu 21 and later) Install nvidia-container-toolkit.

You can activate support for NVIDIA GPUs by installing NVIDIA’s Docker Container Toolkit:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
       && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
       && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
             sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
             tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
       apt-get update && apt-get install -y nvidia-container-toolkit

(Ubuntu 20) Install nvidia-docker2.

To activate support for NVIDIA GPUs, install nvidia-docker2 by running the following command:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
        && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
        && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
        apt-get update && apt-get install -o Dpkg::Options::="--force-confold"   -y nvidia-docker2

iii. Configure the container runtime.

Run the nvidia-ctk command shown below to configure the container runtime.

The nvidia-ctk command modifies the /etc/docker/daemon.json file on the host. The file is updated so that Docker can use the NVIDIA Container Runtime.

nvidia-ctk runtime configure --runtime=docker
systemctl restart docker

Inspect your /etc/docker/daemon.json file to confirm that the configured container runtime has been changed. The system will handle the injection of GPU device connections when new containers start.

iv. Install the latest CUDA drivers.

Running the following command installs the latest CUDA driver versions, which should be compatible with the container toolkit:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
     wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
     dpkg -i cuda-keyring_1.0-1_all.deb
     apt-get update && apt-get -y install cuda-drivers

While we recommend installing the latest version of the CUDA driver, only the minimum required version for your version of Hyperscience is required. See Kubernetes Installation Overview for more information.

v. Deploy the nvidia-device-plugin DaemonSet.

To deploy the DaemonSet, follow the instructions in the “Enabling GPU Support in Kubernetes” section of the NVIDIA device plugin for Kubernetes documentation in GitHub.

AWS EKS

You can use the pre-built GPU AMIs made available by AWS with all the necessary drivers installed. For more information, see the “Amazon EKS optimized accelerated Amazon Linux AMIs” section of AWS’s Amazon EKA optimized Amazon Linux AMIs.
- For a list of pre-built GPU AMIs based on Kubernetes version, see Amazon EKS AMI’s Releases list in GitHub.
Deploy nvidia-plugin-daemonset by following the instructions in the “Enabling GPU Support in Kubernetes” section of the NVIDIA device plugin for Kubernetes documentation in GitHub.

Azure AKS

To enable GPU for Azure AKS, follow the steps in Microsoft’s Use GPUs for compute-intensive workloads on Azure Kubernetes Service (AKS). Doing so sets up the GPU nodes and installs the nvidia-device-plugin DaemonSet.

Google GKE

Follow Google Cloud’s Run GPUs in GKE Standard node pools to set up GPU node pools with drivers and the DaemonSet.

2. Initialize the VLM in the application machine.

Before you can use ORCA, you need to initialize it in the application machine with the GPU. Doing so ensures that that machine will be used for processing tasks that require ORCA.

Run the following commands on the machine that has the GPU:

./run.sh init # standard installation step, skip if already done
./run.sh # standard installation step, skip if already done
./run.sh ipm VISION_LANGUAGE_MODEL_GPU # ORCA-specific installation step (only in the GPU machine)

Next steps

After you’ve finished prepared your infrastructure to use ORCA VLMs, you are ready to apply ORCA to your use case.

To learn how, see “ORCA (Optical Reasoning and Cognition Agent) VLMs” ( v41 | v42 ).