NVIDIA Fabric Manager
In this guide we’ll follow the procedure to enable NVIDIA Fabric Manager.
NVIDIA GPUs that have nvlink support (for eg: A100) will need the nvidia-fabricmanager system extension also enabled in addition to the NVIDIA drivers. For more information on Fabric Manager refer https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html
The published versions of the NVIDIA fabricmanager system extensions is available here
The
nvidia-fabricmanager
extension version has to match with the NVIDIA driver version in use.
Enabling the NVIDIA fabricmanager system extension
Create the boot assets or a custom installer and perform a machine upgrade which include the following system extensions:
ghcr.io/siderolabs/nvidia-open-gpu-kernel-modules:535.129.03-v1.7.6
ghcr.io/siderolabs/nvidia-container-toolkit:535.129.03-v1.14.5
ghcr.io/siderolabs/nvidia-fabricmanager:535.129.03
Patch the machine configuration to load the required modules:
machine:
kernel:
modules:
- name: nvidia
- name: nvidia_uvm
- name: nvidia_drm
- name: nvidia_modeset
sysctls:
net.core.bpf_jit_harden: 1
Last modified January 18, 2024: docs: fork docs for v1.7 (fe24139f3)