You are viewing documentation for Cozystack next, which is currently in beta. For the latest stable version, see the v1.3 documentation.

GPU Sharing with HAMi

Enable fractional GPU sharing in tenant Kubernetes clusters using HAMi.

HAMi (Heterogeneous AI Computing Virtualization Middleware) is a CNCF Sandbox project that enables fractional GPU sharing in Kubernetes. Instead of dedicating an entire GPU to a single workload, HAMi lets containers request specific amounts of GPU memory and compute cores.

How it works

HAMi sits between the Kubernetes scheduler and the NVIDIA GPU driver:

  • A Scheduler Extender adds GPU-aware scheduling decisions (filtering and binding) so pods land on nodes with enough GPU capacity.
  • A Device Plugin registers virtual GPU resources (nvidia.com/gpu, nvidia.com/gpumem, nvidia.com/gpucores) with kubelet.
  • A MutatingWebhook automatically routes GPU pods to the HAMi scheduler.
  • HAMi-core (libvgpu.so) is injected into workload containers via LD_PRELOAD to enforce memory and compute isolation at the CUDA API level.

When HAMi is enabled, GPU Operator’s built-in device plugin is automatically disabled to avoid resource registration conflicts.

Prerequisites

  • A tenant Kubernetes cluster with GPU-enabled worker nodes (node groups with GPUs configured).
  • GPU Operator addon enabled on the tenant cluster.

Enable HAMi

Enable both GPU Operator and HAMi in your tenant Kubernetes cluster configuration:

apiVersion: apps.cozystack.io/v1alpha1
kind: Kubernetes
metadata:
  name: my-cluster
  namespace: tenant-example
spec:
  nodeGroups:
    gpu-workers:
      minReplicas: 1
      maxReplicas: 3
      instanceType: u1.xlarge
      gpus:
        - name: nvidia.com/GA102GL_A10
  addons:
    gpuOperator:
      enabled: true
    hami:
      enabled: true

Apply this configuration:

kubectl apply -f my-cluster.yaml

Request fractional GPU resources

Once HAMi is running, workloads can request fractional GPU resources:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-workload
spec:
  containers:
    - name: cuda-app
      image: nvcr.io/nvidia/cuda:11.8.0-base-ubuntu20.04
      resources:
        limits:
          nvidia.com/gpu: 1
          nvidia.com/gpumem: 3000
          nvidia.com/gpucores: 30

The example above uses absolute memory (gpumem). Use gpumem-percentage for portability across GPU models with different memory sizes.

ResourceDescription
nvidia.com/gpuNumber of virtual GPUs requested
nvidia.com/gpumemGPU memory limit in MiB
nvidia.com/gpucoresPercentage of GPU compute cores (1–100)
nvidia.com/gpumem-percentageGPU memory limit as a percentage (1–100)

Use nvidia.com/gpumem-percentage instead of nvidia.com/gpumem when you want a portable limit that works across different GPU models without knowing exact memory sizes.

If gpumem and gpucores are omitted, the container gets access to the full GPU’s memory and compute capacity. Note that HAMi’s virtualization layer is still active — this is not the same as bare-metal GPU passthrough.

Custom configuration

HAMi’s behavior can be tuned through valuesOverride in the addon configuration:

addons:
  hami:
    enabled: true
    valuesOverride:
      hami:
        devicePlugin:
          deviceSplitCount: 10
          deviceMemoryScaling: 1
        scheduler:
          defaultSchedulerPolicy:
            nodeSchedulerPolicy: binpack
            gpuSchedulerPolicy: spread

All parameters below are relative to the valuesOverride.hami key shown in the example above.

ParameterDescriptionDefault
devicePlugin.deviceSplitCountMaximum virtual GPUs per physical GPU10
devicePlugin.deviceMemoryScalingMemory overcommit factor (>1.0 enables overcommit)1
scheduler.defaultSchedulerPolicy.nodeSchedulerPolicyNode packing strategy: binpack or spreadbinpack
scheduler.defaultSchedulerPolicy.gpuSchedulerPolicyGPU packing strategy: binpack or spreadspread

Known limitations

glibc compatibility

HAMi-core relies on a private glibc symbol (_dl_sym) that was removed in glibc 2.34. This affects workload container images only — HAMi’s own components and the host OS are not affected.

Base imageglibcIsolation
Ubuntu 20.042.31Full (memory + compute)
Ubuntu 22.042.35Memory isolation only (compute isolation fails silently)
Ubuntu 24.042.39No isolation (HAMi-core fails to load silently)
Alpine (musl)N/AIncompatible

The distinction between Ubuntu 22.04 and 24.04 behavior is based on upstream testing — see HAMi-core #174 for details.

Most current CUDA 12.x and PyTorch 2.x images use Ubuntu 22.04+, so compute isolation will not work with them. Use images based on Ubuntu 20.04 or older for full isolation until the upstream fix lands.

Alpine / musl libc

HAMi-core is incompatible with musl libc. Only glibc-based container images (Debian, Ubuntu, RHEL) are supported.