Enabling Trainers with GPUs in On-Premise Docker Deployments

Prev Next

With Hyperscience, you can classify and extract data from Unstructured documents. However, automating these processes requires more computing resources than our automation capabilities for Structured and Semi-structured documents do.

In order to process Unstructured documents in on-premise deployments of Hyperscience, you need to add a trainer that has both a GPU (graphics processing unit) and a CPU (central processing unit). GPUs have specialized cores that allow the system to perform multiple computations in parallel, reducing the time required to complete the complex operations required to train models for Unstructured documents. When you attach a trainer whose machine has a GPU, you can maximize the benefits of Long-form Extraction. To learn more about this feature, see Long-form Extraction.

You also have the option to use GPUs to train Field Identification or Table Identification models. For more information on support for trainers with GPUs, see Infrastructure Requirements.

This article describes how to enable a trainer with both a GPU and a CPU in an on-premise Docker deployment of Hyperscience. Steps 1-3 must be completed before untarring the Hyperscience bundle on the trainer machine. For more information on setting up the trainer, see Technical Installation / Upgrade Instructions.

1. Make sure your GPU hardware meets the requirements.

See Infrastructure Requirements for more information.

2. Make sure your trainer meets the software compatibility requirements.

There are several software-compatibility considerations to keep in mind when setting up your trainer.

a. Verify that your GPU supports CUDA.

CUDA is a parallel computing platform and programming model created by NVIDIA. Machine learning often uses CUDA-based libraries, SDKs, and other tools.

You can find out whether your GPU supports CUDA by running the following command:

lspci | grep -i nvidia

For more information, see NVIDIA’s CUDA GPUs - Compute Capability and NVIDIA CUDA Installation Guide for Linux.

b. Verify that you have a supported version of Linux.

Follow the instructions in NVIDIA’s NVIDIA CUDA Installation Guide for Linux to check your version of Linux. Then, make sure your Linux version is supported by the latest CUDA Toolkit by reviewing NVIDIA’s NVIDIA CUDA Toolkit Release Notes.  

You should also ensure that you are running a version of Linux that is supported by Hyperscience. For a list of supported Linux distributions and versions, see Infrastructure Requirements.

c. Verify that the system has gcc installed.

The gcc compiler is required for development using the CUDA Toolkit. To make sure it is installed, follow the instructions in NVIDIA’s NVIDIA CUDA Installation Guide for Linux.

d. Verify that the system has the current Kernel headers and development packages installed.

Kernel headers are header files that specify the interface between the Linux kernel and userspace libraries and programs. The CUDA driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 3.17.4-301, the 3.17.4-301 kernel headers and development packages must also be installed.

To verify that these requirements are met, run the following command:

apt-get install linux-headers-$(uname -r)

For more information and commands for various Linux distributions, see NVIDIA’s NVIDIA CUDA Installation Guide for Linux.

3. Install the CUDA drivers.

Follow these steps to make sure that you have the most current and correct CUDA drivers installed.

a. Remove any CUDA drivers installed on the system.

Compatibility between the CUDA Toolkit and CUDA drivers is crucial. In our Docker-based system, the CUDA Toolkit is installed in the trainer images, and you need to make sure that they match the CUDA drivers that are installed in the host OS. The current CUDA Toolkit version installed should be compatible with the latest available CUDA driver. For details on toolkit and driver versions, see NVIDIA’s NVIDIA CUDA Toolkit Release Notes.

CUDADriversAndToolkits.png


An example command for removing these drivers appears below:

apt-get clean; apt-get update; apt-get purge -y cuda*; apt-get purge -y nvidia-*; apt-get -y autoremove

You can tailor this command to match your Linux distribution.

b. Install nvidia-container-toolkit or nvidia-docker2.

The Docker host needs to be prepared before it can expose your GPU hardware to the containers. Although containers share your host’s kernel, they cannot access information on the system packages you have installed. A plain container will lack the device drivers that interface with your GPU.

(Ubuntu 21 and later) Install nvidia-container-toolkit.

You can activate support for NVIDIA GPUs by installing NVIDIA’s Docker Container Toolkit:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
         curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
           sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
           tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
         apt-get update
         apt-get install -y nvidia-container-toolkit

Then, verify the contents of your /etc/docker/daemon.json, as described below.

(Ubuntu 20) Install nvidia-docker2.

To activate support for NVIDIA GPUs, install nvidia-docker2 by running the following command:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
        && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
        && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
        apt-get update && apt-get install -o Dpkg::Options::="--force-confold"   -y nvidia-docker2

Then, verify the contents of your /etc/docker/daemon.json, as described below.

(Ubuntu 20 and later) Check the daemon file.

Make sure your Docker daemon file looks like the one shown below:

{
        "log-driver": "json-file",
        "log-opts": {
        "max-size": "10m",
        "max-file": "10"
        },
        "runtimes": {
        "nvidia": {
        "path": "/usr/bin/nvidia-container-runtime",
        "runtimeArgs": []
        }
        },
        "default-runtime": "nvidia",
        "node-generic-resources": [
        "NVIDIA-GPU=0"
        ]
}

Inspect your /etc/docker/daemon.json file to confirm that the configured container runtime has been changed. The NVIDIA Toolkit will handle the injection of GPU device connections when new containers start.

c. Install the latest CUDA drivers.

Running the following command installs the latest CUDA driver versions, which should be compatible with the container toolkit:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
        wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
        dpkg -i cuda-keyring_1.0-1_all.deb
        apt-get update && apt-get -y install cuda-drivers

While we recommend installing the latest version of the CUDA driver, only the minimum required version for your version of Hyperscience is required. See Infrastructure Requirements for more information.

Next steps

After you’ve enabled your trainer, follow the steps in “Long-form Extraction” ( v41 | v42 ) to apply it to your Long-form Extraction use case.

Troubleshooting

If the trainer containers are unable to connect to the GPU (e.g., training fails with a GPU is not available error), ensure that Docker is using the cgroupfs driver by adding the following to your /etc/docker/daemon.json file:

"exec-opts": ["native.cgroupdriver=cgroupfs"]