NVIDIA GRID and CUDA Driver Installation for Azure VMs

Prev Next

Before using an Azure GPU VM with the Hyperscience application, you need to install either a GRID driver or a CUDA driver. These installations require different steps and considerations than those outlined in our “Enabling Application Machines with GPUs” and “Enabling Trainer Machines with GPUs” articles.

This article will help you determine whether you need to install a GRID driver or a CUDA driver. It then guides you through the installation of the appropriate driver for your Azure GPU VM.

Azure GPU VM series: GRID drivers vs CUDA drivers

The driver type needed for your Azure GPU VM depends on whether the GPU is presented as a vGPU (virtualized, partitioned) or via passthrough (direct access to the full physical GPU).

Driver type

Azure VM series

GPU

GPU access

Primary use case

GRID/vGPU

NVadsA10v5

A10

vGPU (partitioned)

Graphics + Compute/AI

GRID/vGPU

NVv3

M60

vGPU

VDI / Graphics

GRID/vGPU

NCasT4v3

T4

vGPU

Graphics + Compute

CUDA (standard)

NCv3

V100

Passthrough

AI / HPC

CUDA (standard)

ND series

P40 / P100

Passthrough

Deep Learning

CUDA (standard)

NDv2

V100

Passthrough

Distributed Training

CUDA (standard)

NCads_H100_v5

H100

Passthrough

AI / HPC

General guidelines

  • NV-series (visualization-oriented) → GRID driver (vGPU)

  • NC/ND-series (compute-oriented) → Standard CUDA driver (passthrough)

  • Exception: NCasT4v3 uses GRID despite being "NC" branded, because the T4 is exposed as a vGPU on Azure.

NVIDIA GRID driver installation for Azure NVadsA10v5 VMs (RHEL 9.7)

If you are deploying on an NC or ND series VM (e.g., NCv3 with V100), you should install the standard NVIDIA CUDA driver from the NVIDIA website or via dnf module install nvidia-driver instead of following this GRID guide.

Prerequisites

  • Azure VM: Standard_NV36ads_A10_v5 (36 vCPU, 440 GiB RAM, 1x A10 24 GB)

  • OS: RHEL 9.7 with active subscription (for DNF repositories)

  • SSH access to the VM

  • /opt must have at least 2 GB free space for the build process (see Install the GRID driver for more information)

1. Verify GPU hardware.

Confirm the NVIDIA A10 GPU is visible to the VM:

lspci | grep -i nvidia

Expected output:

0002:00:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)

If lspci is not available, install it:

sudo dnf install -y pciutils

2. Install build prerequisites.

The GRID driver compiles a kernel module at install time. It needs the kernel development headers, GCC, and DKMS.

# Install kernel headers and development packages for the running kernel
sudo dnf install -y \
  kernel-devel-$(uname -r) \
  kernel-headers-$(uname -r) \
  gcc \
  make \
  elfutils-libelf-devel \
  pciutils

# Install EPEL repository (required for DKMS)
sudo dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm

# Install DKMS (Dynamic Kernel Module Support)
sudo dnf install -y dkms

Verify prerequisites

gcc --version          # Should show GCC 11.x
rpm -q kernel-devel-$(uname -r)   # Should show installed
rpm -q dkms            # Should show installed

3. Disable Nouveau.

Nouveau is an open-source NVIDIA driver that conflicts with the proprietary GRID driver. It must be disabled (a.k.a. blacklisted) before installation.

# Create the blacklist configuration
echo -e "blacklist nouveau\nblacklist lbm-nouveau" | sudo tee /etc/modprobe.d/nouveau.conf

# Rebuild the initramfs to ensure Nouveau is not loaded at boot
sudo dracut --force

Verify Nouveau is not loaded

lsmod | grep nouveau
# Should return nothing (no output = desired result)

On fresh Azure RHEL 9.7 VMs, Nouveau is typically not loaded by default. This step is a safety measure.

4. Download the NVIDIA GRID driver.

Microsoft hosts the GRID driver specifically for Azure VMs. Download the vGPU 18.6 (R570) driver for NVadsA10v5:

sudo curl -fL -o /tmp/NVIDIA-Linux-x86_64-grid-azure.run \
  'https://download.microsoft.com/download/2a04ca6a-9eec-40d9-9564-9cdea1ab795f/NVIDIA-Linux-x86_64-570.211.01-grid-azure.run'

sudo chmod +x /tmp/NVIDIA-Linux-x86_64-grid-azure.run

The file is approximately 365 MB.

5. Install the GRID driver.

Important

The NVIDIA installer uses /tmp for build artifacts. On RHEL 9.7 Azure VMs, /tmp is often only 2 GB, which is not enough for the kernel module build. Use the --tmpdir flag to point to a directory with more space.

# Create a temporary build directory on a filesystem with sufficient space
sudo mkdir -p /opt/tmp

# Run the installer in silent mode with DKMS support
sudo /tmp/NVIDIA-Linux-x86_64-grid-azure.run \
  --silent \b
  --dkms \
  --tmpdir /opt/tmp

Expected output: The installer may print warnings about X libraries and Vulkan ICD loader — these are safe to ignore on headless GPU VMs (no desktop environment).

What --dkms does: Registers the driver with DKMS so the kernel module is automatically rebuilt when the kernel is updated.

If installation fails

Check the installer log:

sudo cat /var/log/nvidia-installer.log

Common issues:

  • "No space left on device" → Use --tmpdir as shown above

  • "kernel source not found" → Ensure kernel-devel-$(uname -r) is installed

  • "gcc not found" → Install gcc with sudo dnf install -y gcc

6. Configure GRID license settings.

Azure provides automatic GRID licensing. Configure the GRID daemon:

# Copy the template configuration
sudo cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf

# Add required settings for Azure
echo "IgnoreSP=FALSE" | sudo tee -a /etc/nvidia/gridd.conf
echo "EnableUI=FALSE" | sudo tee -a /etc/nvidia/gridd.conf

# Remove FeatureType=0 if present (Azure handles licensing automatically)
sudo sed -i '/^FeatureType=0/d' /etc/nvidia/gridd.conf

7. Verify the installation.

Run nvidia-smi to confirm the driver is working:

nvidia-smi

Expected output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.211.01             Driver Version: 570.211.01     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
|=========================================+========================+======================|
|   0  NVIDIA A10-24Q                 On  |   00000002:00:00.0 Off |                    0 |
| N/A   N/A    P0            N/A  /  N/A  |       0MiB /  24512MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+

Key specifications to verify:

  • Driver version: 570.211.01

  • CUDA version: 12.8

  • GPU name: NVIDIA A10-24Q

  • Memory: 24512 MiB (≈24 GB)

8. Clean up.

Remove the installer and temporary build files:

sudo rm -rf /opt/tmp /tmp/NVIDIA-Linux-x86_64-grid-azure.run

Post-Installation: NVIDIA Container Toolkit (Podman)

After the GRID driver is installed and nvidia-smi works, install the NVIDIA Container Toolkit so Podman containers can access the GPU. This step is required before starting the Hyperscience application.

Install the toolkit

# Add the NVIDIA container toolkit repository
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \
  | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
# Install the toolkit
sudo dnf install -y nvidia-container-toolkit

Configure CDI (Container Device Interface)

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

Configure Podman to use the NVIDIA runtime

Edit /usr/share/containers/containers.conf:

sudo sed -i 's/^#\?runtime = .*/runtime = "nvidia"/' /usr/share/containers/containers.conf

Add the NVIDIA runtime to the [engine.runtimes] section:

[engine.runtimes]
nvidia = ["/usr/bin/nvidia-container-runtime"]

Verify GPU access from Podman

sudo podman run --rm --device nvidia.com/gpu=all \
  nvidia/cuda:12.8.0-base-ubi9 nvidia-smi

Standard CUDA driver installation (NC/ND-series VMs)

This section covers installing standard NVIDIA CUDA drivers on Azure VMs that use GPU passthrough (not vGPU). It applies to NC-series (compute) and ND-series (deep learning) VMs where the GPU is presented directly to the VM.

This section does NOT apply to NVadsA10v5, NVv3, or NCasT4v3 VMs. Those use vGPU and require the GRID driver documented above.

When to use this guide

Use the standard CUDA driver when deploying on these Azure VM series:

VM series

GPU

vGPU (smallest

Approx. cost / hour

Notes

NC6s_v3

V100 16GB

6

~$3.06

Cheapest CUDA VM; retiring Sept 2025

NC24ads_A100_v4

A100 80GB

24

~$3.67

Current generation

NC40ads_H100_v5

H100 80GB

40

~$7.35

Latest generation

ND40rs_v2

V100 32GB x8

40

~$22.03

Multi-GPU distributed training

For quick testing, the NC6s_v3 (V100) is the cheapest option at ~$3/hr on-demand, or ~$0.56/hr on spot instances. Check availability — NCv3 is retiring and not available in all regions. NC6s_v3 is available unrestricted in southcentralus as of March 2026.

Official Azure documentation

The primary reference is Microsoft’s Azure N-Series GPU Driver Setup for Linux. The article’s “CentOS or Red Hat Enterprise Linux” section covers RHEL 7 and 8, but the same approach works for RHEL 9 with updated repository URLs (see below).

CUDA driver installation on RHEL

1. Update the kernel and install the prerequisites.

sudo dnf update -y
sudo dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r) \
  gcc make dkms acpid libglvnd-glx libglvnd-opengl libglvnd-devel pkgconfig
sudo reboot

2. Enable the required repositories.

# Enable CodeReady Builder (needed for some build dependencies)
sudo subscription-manager repos --enable=codeready-builder-for-rhel-9-x86_64-rpms

# Install EPEL 9
sudo dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm

3. Add the NVIDIA CUDA repository for RHEL.

sudo dnf config-manager --add-repo \
  https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
sudo dnf clean all

The Azure documentation references rhel7 and rhel8 repositories. For RHEL 9, use rhel9 in the URL path, as shown above.

4. Install the NVIDIA driver and CUDA Toolkit.

Before beginning the installation, check which driver stream to use. Not all GPUs support the latest stream.

GPU

Supported driver stream

Notes

H100, A100, L40

latest-dkms (595.x)

Current generation

T4, A10

latest-dkms (595.x)

Current generation

V100

580-dkms (580.x)

Legacy — latest will NOT work

P100, P40

535-dkms (535.x)

Legacy

K80

470-dkms (470.x)

End of life

# List available streams
sudo dnf module list nvidia-driver

# For CURRENT GPUs (H100, A100, T4, etc.):
sudo dnf module enable -y nvidia-driver:latest-dkms

# For V100 (LEGACY — latest stream drops V100 support!):
sudo dnf module enable -y nvidia-driver:580-dkms

# Install the driver
sudo dnf install -y nvidia-driver cuda-drivers

# (Optional) Install the full CUDA Toolkit for development
sudo dnf install -y cuda

Note for V100: The latest-dkms stream installs driver 595.x, which ignores the V100. You'll see the following in dmesg:

NVRM: The NVIDIA Tesla V100-PCIE-16GB GPU installed in this system is supported through the NVIDIA 580.xx Legacy drivers.
The 595.58.03 NVIDIA driver will ignore this GPU.

Fix: Use nvidia-driver:580-dkms instead. The installation can take several minutes as it builds DKMS kernel modules.

5. Reboot and verify.

sudo reboot

After reboot:

nvidia-smi

Expected output (example for NC6s_v3 with 580-dkms — validated March 2026):

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.20             Driver Version: 580.126.20     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
|   0  Tesla V100-PCIE-16GB           Off |   00000001:00:00.0 Off |                    0 |
+-----------------------------------------+------------------------+----------------------+

Key difference from GRID: The GPU name shows Tesla V100-PCIE-16GB (bare-metal passthrough) instead of a virtual GPU name like "NVIDIA A10-24Q" (vGPU).

CUDA driver installation on Ubuntu 22.04 / 24.04 LTS

Ubuntu has a significantly simpler installation path because Canonical packages and signs the NVIDIA proprietary drivers directly.

1. Install the NVIDIA driver.

sudo apt update && sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers install
sudo reboot

No additional action is required for the driver installation — ubuntu-drivers auto-detects the GPU and installs the correct signed driver. This auto-detection also works with Secure Boot enabled.

2. (Optional) Install the CUDA Toolkit.

# Download the CUDA keyring for your Ubuntu version (example: 24.04)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo apt install -y ./cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install -y cuda-toolkit-12-5
sudo reboot

Adjust the URL for your Ubuntu version:

  • Ubuntu 22.04 — use ubuntu2204 in the URL

  • Ubuntu 24.04 — use ubuntu2404 in the URL

Refer to NVIDIA’s CUDA Toolkit Downloads page for the latest toolkit version.

3. Verify the installation.

nvidia-smi
nvcc --version   # Only if the CUDA Toolkit was installed