Enforcing Docker & Nvidia Container Toolkit Persistence
My previous post covered enabling Nvidia Container Toolkit with a Ubuntu VM (Ubuntu 24.04.2 LTS) and after running it for a few weeks I noticed the following:
- The nvidia driver toolkit did not persist during a reboot without running
nvidia-smi
- The nvidia container toolkit seemed to disconnect the GPU from containers after 24-48 hours
Both of the above caused issues with GPU-usage from the containers to be disabled, e.g. when attempting to transcode using jellyfin.
Add Kernel parameter via Grub
The following parameter boots the nvidia gpu accordingly as part of start-up.
- Check to see if you have default Grub file
cat /etc/default/grub
- Add the following to the file
/etc/default/grub
it should be in the section with GRUB_CMDLINE_LINUX using something likesudo vi /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash systemd.unified_cgroup_hierarchy=0"
- After saving the file, update grub with
sudo update-grub
Adding Nvidia /dev:/dev to Docker Compose
I also updated the Docker Compose for all GPU related containers, example below is the runtime
, devices
, and resources
added for Jellyfin.
Doing this ensures all devices being used/referenced by Docker Containers from the Host's Nvidia runtimes.
runtime: nvidia
devices:
- '/dev/nvidia-caps:/dev/nvidia-caps'
- '/dev/nvidia0:/dev/nvidia0'
- '/dev/nvidiactl:/dev/nvidiactl'
- '/dev/nvidia-modeset:/dev/nvidia-modeset'
- '/dev/nvidia-uvm:/dev/nvidia-uvm'
- '/dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools'
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
Credits / Links
CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected · Issue #666 · HaveAGitGat/Tdarr
Now that I finally converted all of my videos using Tdarr I like how I can leave it running and when new videos are downloaded it compresses them. And this works... But then it dies with “CUDA_ERRO…
NOTICE: Containers losing access to GPUs with error: “Failed to initialize NVML: Unknown Error” · Issue #48 · NVIDIA/nvidia-container-toolkit
1. Executive summary Under specific conditions, it’s possible that containers may be abruptly detached from the GPUs they were initially connected to. We have determined the root cause of this issu…
NVIDIA GPU | Jellyfin
This tutorial guides you on setting up full video hardware acceleration on NVIDIA GPU via NVENC.
