Skip to content

These steps have been adapted from Jim's Garage's guide and have been expanded with additional context. It is recommended to perform these steps via SSH or the Proxmox web console.  Keep in mind, once a GPU is isolated from the host, it will no longer output display from the host.

1. Enable IOMMU in the bios (Intel VT-d & VT-x or AMD-V)

Recommended to Google this as the process and options vary by montherboard and manufactorer.

2. Enable IOMMU in GRUB

  1. ssh into the Proxmox host
  2. Edit grub config

    nano /etc/default/grub
    
  3. Update the GRUB_CMDLINE_LINUX_DEFAULT variable in the grub config with the appropriate commands below.

    Intel:

    GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
    

    AMD:

    GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on"
    

  4. Save changes and exit editor

  5. Run update-grub
  6. Verify IOMMU is enabled by running the following command:

    dmesg | grep -e DMAR -e IOMMU
    

    You should see a line like this in the output:

    [    0.115654] DMAR: IOMMU enabled
    

    If you see a line confirming IOMMU is enabled, you’re good to proceed to step 7.

    If nothing prints, try running dmesg by itself and manually searching for "DMAR" or "IOMMU" in the output. If you still don’t see anything, or if instead you see error messages, check out the troubleshoot section below for a possible fix.

    Error logs fill up space

    The dmesg log can grow large over time and may fill up storage on your Proxmox server if not managed. So it's best to make sure any errors are resolved or silenced.

  7. Reboot the Proxmox host

3. Enable VFIO modules

  1. shh into the Proxmox host
  2. Edit the VFIO modules

    nano nano /etc/modules
    
  3. Add the following modules:

    vfio
    vfio_iommu_type1
    vfio_pci
    vfio_virqfd
    
  4. Save changes and exit editor

  5. Run update-initramfs -u -k all
  6. Reboot the Proxmox host
  7. ssh into Proxmox host
  8. Verify the modules are enabled with the following. Check the driver version line is present.

    dmesg | grep -i vfio
    

4. Isolate GPU from Host

Updating graphics cards

You'll need to perform the following steps anytime you replace, remove or add GPUs.

  1. ssh into Proxmox host
  2. Get the ids of the GPU device(s) and associated hdmi audio device(s)

    To get all devices...

    lspci -nn
    

    To get Nvidia gpu ids...

    lspci -nn | grep -i nvidia
    

    To get AMD gpu ids...

    lspci -nn | grep -i amd
    

  3. Take note of the device codes for both the VGA compatible controller and Audio device. Device IDs are typically two groups of four alphanumeric characters separated by a colon (i.e. XXXX:XXXX). Each GPU usually has two device IDs, one for video and one for audio.

    Here’s an example of the output for an NVIDIA GPU:

    01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3070 Ti] [10de:2482] (rev a1)
    01:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1)
    
  4. Edit the vfio.conf to bind devices to the vfio-pci driver:

    nano /etc/modprobe.d/vfio.conf
    
  5. Add an options line to the vfio.conf (see examples below), replacing the device IDs with the ones you found in step 3. If you have multiple GPUs (or any PCI devices), list all the device IDs together in a single options line, separated by commas:

    Example with one GPU:

    options vfio-pci ids=10de:2482,10de:228b disable_vga=1
    

    Example of two GPUs:

    options vfio-pci ids=10de:2482,10de:228b,10de:2204,10de:1aef disable_vga=1
    
  6. Save changes and exit editor

5. Blacklist GPU drivers

  1. ssh into the Proxmox host
  2. Run the following commands to blacklist gpu drivers

    echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf 
    echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf 
    echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf 
    echo "blacklist nvidiafb" >> /etc/modprobe.d/blacklist.conf
    echo "blacklist nvidia_drm" >> /etc/modprobe.d/blacklist.conf
    
  3. Reboot the Proxmox host

Next Steps

Troubleshoot

"AER Correctable error message recieved" when running dmesg

This message is usually related to PCIe Active State Power Management (ASPM). While not the "best" way, you can try disabling ASPM to fix it. To do this, add pcie_aspm=off to the GRUB_CMDLINE_LINUX_DEFAULT line in /etc/default/grub. This is the same file you edited earlier in step 2.

For example, on Intel systems, it might look like this:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_aspm=off".

After making this change, update grub with sudo update-grub and reboot your system.


grafts: