Running VMs with GPU Passthrough

This section demonstrates how to deploy virtual machines (VMs) with GPU passthrough using Cozystack. First, we’ll deploy the GPU Operator to configure the worker node for GPU passthrough Then we will deploy a KubeVirt VM that requests a GPU.

By default, to provision a GPU Passthrough, the GPU Operator will deploy the following components:

VFIO Manager to bind vfio-pci driver to all GPUs on the node.
Sandbox Device Plugin to discover and advertise the passthrough GPUs to kubelet.
Sandbox Validator to validate the other operands.

Prerequisites

A Cozystack cluster with at least one GPU-enabled node.
kubectl installed and cluster access credentials configured.

1. Install the GPU Operator

Follow these steps:

Label the worker node explicitly for GPU passthrough workloads:

kubectl label node <node-name> --overwrite nvidia.com/gpu.workload.config=vm-passthrough

Enable the GPU Operator in your Platform Package by adding it to the enabled packages list:

kubectl patch packages.cozystack.io cozystack.cozystack-platform --type=json \
  -p '[{"op": "add", "path": "/spec/components/platform/values/bundles/enabledPackages/-", "value": "gpu-operator"}]'

This will deploy the components (operands).

Ensure all pods are in a running state and all validations succeed with the sandbox-validator component:

kubectl get pods -n cozy-gpu-operator

Example output (your pod names may vary):

NAME                                            READY   STATUS    RESTARTS   AGE
...
nvidia-sandbox-device-plugin-daemonset-4mxsc    1/1     Running   0          40s
nvidia-sandbox-validator-vxj7t                  1/1     Running   0          40s
nvidia-vfio-manager-thfwf                       1/1     Running   0          78s

To verify the GPU binding, access the node using kubectl node-shell -n cozy-system -x or kubectl debug node and run:

lspci -nnk -d 10de:

The vfio-manager pod will bind all GPUs on the node to the vfio-pci driver. Example output:

3b:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:2236] (rev a1)
       Subsystem: NVIDIA Corporation Device [10de:1482]
       Kernel driver in use: vfio-pci
86:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:2236] (rev a1)
       Subsystem: NVIDIA Corporation Device [10de:1482]
       Kernel driver in use: vfio-pci

The sandbox-device-plugin will discover and advertise these resources to kubelet. In this example, the node shows two A10 GPUs as available resources:

kubectl describe node <node-name>

Example output:

...
Capacity:
  ...
  nvidia.com/GA102GL_A10:         2
  ...
Allocatable:
  ...
  nvidia.com/GA102GL_A10:         2
...

Note: Resource names are constructed by combining the device and device_name columns from the PCI IDs database. For example, the database entry for A10 reads 2236 GA102GL [A10], which results in a resource name nvidia.com/GA102GL_A10.

2. Update the KubeVirt Custom Resource

Next, we will update the KubeVirt Custom Resource, as documented in the KubeVirt user guide, so that the passthrough GPUs are permitted and can be requested by a KubeVirt VM.

Adjust the pciVendorSelector and resourceName values to match your specific GPU model. Setting externalResourceProvider=true indicates that this resource is provided by an external device plugin, in this case the sandbox-device-plugin which is deployed by the Operator.

kubectl edit kubevirt -n cozy-kubevirt

example config:

  ...
  spec:
    configuration:
      permittedHostDevices:
        pciHostDevices:
        - externalResourceProvider: true
          pciVendorSelector: 10DE:2236
          resourceName: nvidia.com/GA102GL_A10
  ...

3. Create a Virtual Machine

We are now ready to create a VM.

Create a sample virtual machine using the following VMI specification that requests the nvidia.com/GA102GL_A10 resource.

vmi-gpu.yaml:

---
apiVersion: apps.cozystack.io/v1alpha1
appVersion: '*'
kind: VirtualMachine
metadata:
  name: gpu
  namespace: tenant-example
spec:
  running: true
  instanceProfile: ubuntu
  instanceType: u1.medium
  systemDisk:
    image: ubuntu
    storage: 5Gi
    storageClass: replicated
  gpus:
  - name: nvidia.com/GA102GL_A10
  cloudInit: |
    #cloud-config
    password: ubuntu
    chpasswd: { expire: False }

kubectl apply -f vmi-gpu.yaml

Example output:

virtualmachines.apps.cozystack.io/gpu created

Verify the VM status:

kubectl get vmi

NAME                       AGE   PHASE     IP             NODENAME        READY
virtual-machine-gpu        73m   Running   10.244.3.191   luc-csxhk-002   True

virtctl console virtual-machine-gpu

Example output:

Successfully connected to vmi-gpu console. The escape sequence is ^]

vmi-gpu login: ubuntu
Password:

ubuntu@virtual-machine-gpu:~$ lspci -nnk -d 10de:
08:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [A10] [10de:26b9] (rev a1)
        Subsystem: NVIDIA Corporation GA102GL [A10] [10de:1851]
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nvidia_drm, nvidia

GPU passthrough assigns an entire physical GPU to a single VM. To share one GPU between multiple VMs, you can use NVIDIA vGPU, which creates virtual GPUs from a single physical GPU using mediated devices (mdev).

Why not MIG? MIG (Multi-Instance GPU) partitions a GPU into isolated instances, but these are logical divisions within a single PCIe device. VFIO cannot pass them to VMs — MIG only works with containers. To use MIG with VMs, you need vGPU on top of MIG partitions (still requires a vGPU license).

Prerequisites

A GPU that supports vGPU (e.g., NVIDIA L40S, A100, A30, A16)
An NVIDIA vGPU Software license (NVIDIA AI Enterprise or vGPU subscription)
Access to the NVIDIA Licensing Portal to download the vGPU Manager driver

The vGPU Manager driver is proprietary software distributed by NVIDIA under a commercial license. Cozystack does not include or redistribute this driver. You must obtain it directly from NVIDIA and build the container image yourself.

1. Build the vGPU Manager Image

The GPU Operator expects a pre-built driver container image — it does not install the driver from a raw .run file at runtime.

Download the vGPU Manager driver from the NVIDIA Licensing Portal (Software Downloads → NVIDIA AI Enterprise → Linux KVM)
Build the driver container image using NVIDIA’s Makefile-based build system:

# Clone the NVIDIA driver container repository
git clone https://gitlab.com/nvidia/container-images/driver.git
cd driver

# Place the downloaded .run file in the appropriate directory
cp NVIDIA-Linux-x86_64-550.90.05-vgpu-kvm.run vgpu/

# Build using the provided Makefile
make OS_TAG=ubuntu22.04 \
  VGPU_DRIVER_VERSION=550.90.05 \
  PRIVATE_REGISTRY=registry.example.com/nvidia

# Push to your private registry
docker push registry.example.com/nvidia/vgpu-manager:550.90.05

The build process compiles kernel modules against the host kernel version. Refer to the NVIDIA GPU Operator vGPU documentation for the complete build procedure and supported OS/kernel combinations.

Uploading the vGPU driver to a publicly available registry is a violation of the NVIDIA vGPU EULA. Always use a private registry.

2. Install the GPU Operator with vGPU Variant

The GPU Operator provides a vgpu variant that enables the vGPU Manager and vGPU Device Manager instead of the VFIO Manager used in passthrough mode.

Label the worker node for vGPU workloads:

kubectl label node <node-name> --overwrite nvidia.com/gpu.workload.config=vm-vgpu

Create the GPU Operator Package with the vgpu variant, providing your vGPU Manager image coordinates:

apiVersion: cozystack.io/v1alpha1
kind: Package
metadata:
  name: cozystack.gpu-operator
spec:
  variant: vgpu
  components:
    gpu-operator:
      values:
        gpu-operator:
          vgpuManager:
            repository: registry.example.com/nvidia
            version: "550.90.05"

If your registry requires authentication, create an imagePullSecret in the cozy-gpu-operator namespace first, then reference it:

gpu-operator:
  vgpuManager:
    repository: registry.example.com/nvidia
    version: "550.90.05"
    imagePullSecrets:
    - name: nvidia-registry-secret

Verify all pods are running:

kubectl get pods -n cozy-gpu-operator

Example output:

NAME                                            READY   STATUS    RESTARTS   AGE
...
nvidia-vgpu-manager-daemonset-xxxxx             1/1     Running   0          60s
nvidia-vgpu-device-manager-xxxxx                1/1     Running   0          45s
nvidia-sandbox-validator-xxxxx                  1/1     Running   0          30s

3. Configure NVIDIA License Server (NLS)

vGPU requires a license to operate. Create a Secret with the NLS client configuration:

apiVersion: v1
kind: Secret
metadata:
  name: licensing-config
  namespace: cozy-gpu-operator
stringData:
  gridd.conf: |
    ServerAddress=nls.example.com
    ServerPort=443
    FeatureType=1  # 1 for vGPU (vPC/vWS), 2 for Virtual Compute Server (vCS)
    # ServerPort depends on your NLS deployment (commonly 443 for DLS or 7070 for legacy NLS)

Then reference the Secret in the Package values:

gpu-operator:
  vgpuManager:
    repository: registry.example.com/nvidia
    version: "550.90.05"
  driver:
    licensingConfig:
      secretName: licensing-config

4. Update the KubeVirt Custom Resource

Configure KubeVirt to permit mediated devices. The mediatedDeviceTypes field specifies which vGPU profiles to use, and permittedHostDevices makes them available to VMs:

kubectl edit kubevirt -n cozy-kubevirt

spec:
  configuration:
    mediatedDevicesConfiguration:
      mediatedDeviceTypes:
      - nvidia-592    # Example: NVIDIA L40S-24Q
    permittedHostDevices:
      mediatedDevices:
      - mdevNameSelector: NVIDIA L40S-24Q
        resourceName: nvidia.com/NVIDIA_L40S-24Q

To find the correct type ID and profile name for your GPU, consult the NVIDIA vGPU User Guide.

5. Create a Virtual Machine with vGPU

apiVersion: apps.cozystack.io/v1alpha1
appVersion: '*'
kind: VirtualMachine
metadata:
  name: gpu-vgpu
  namespace: tenant-example
spec:
  running: true
  instanceProfile: ubuntu
  instanceType: u1.medium
  systemDisk:
    image: ubuntu
    storage: 5Gi
    storageClass: replicated
  gpus:
  - name: nvidia.com/NVIDIA_L40S-24Q
  cloudInit: |
    #cloud-config
    password: ubuntu
    chpasswd: { expire: False }

kubectl apply -f vmi-vgpu.yaml

Once the VM is running, log in and verify the vGPU is available:

virtctl console virtual-machine-gpu-vgpu

ubuntu@virtual-machine-gpu-vgpu:~$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.05              Driver Version: 550.90.05    CUDA Version: 12.4       |
|                                                                                         |
| GPU  Name              ...   MIG M.                                                     |
|  0   NVIDIA L40S-24Q   ...   N/A                                                        |
+-----------------------------------------------------------------------------------------+

vGPU Profiles

Each GPU model supports specific vGPU profiles that determine how the GPU is partitioned. Common profiles for NVIDIA L40S:

Profile	Frame Buffer	Max Instances	Use Case
NVIDIA L40S-1Q	1 GB	48	Light 3D / VDI
NVIDIA L40S-2Q	2 GB	24	Medium 3D / VDI
NVIDIA L40S-4Q	4 GB	12	Heavy 3D / VDI
NVIDIA L40S-6Q	6 GB	8	Professional 3D
NVIDIA L40S-8Q	8 GB	6	AI/ML inference
NVIDIA L40S-12Q	12 GB	4	AI/ML training
NVIDIA L40S-24Q	24 GB	2	Large AI workloads
NVIDIA L40S-48Q	48 GB	1	Full GPU equivalent

Open-Source vGPU (Experimental)

NVIDIA is developing open-source vGPU support for the Linux kernel. Once merged, this could enable GPU sharing without a commercial license.

Status: RFC stage, not merged into mainline kernel
Supports Ada Lovelace and newer (L4, L40, etc.)
References: Phoronix announcement, kernel patches

Last modified 2026-04-02: fix(gpu): use Secret for licensing config, fix console hostname (492f318)

CozySummit Virtual 2026 · May 26 · Register Now

Running VMs with GPU Passthrough

Prerequisites

1. Install the GPU Operator

2. Update the KubeVirt Custom Resource

3. Create a Virtual Machine

GPU Sharing for Virtual Machines (vGPU)

Prerequisites

1. Build the vGPU Manager Image

2. Install the GPU Operator with vGPU Variant

3. Configure NVIDIA License Server (NLS)

4. Update the KubeVirt Custom Resource

5. Create a Virtual Machine with vGPU

vGPU Profiles

Open-Source vGPU (Experimental)