mirror of
https://github.com/Motorhead1991/qemu.git
synced 2025-07-27 04:13:53 -06:00

Starting from Intel Core Ultra Series (Meteor Lake), Data Stolen Memory has became a part of LMEMBAR (MMIO BAR2) [1][2], meaning that BDSM and GGC register quirks are no longer needed on these platforms. To support Meteor/Arrow/Lunar Lake and future IGD devices, remove the generation limitation in IGD passthrough, and apply BDSM and GGC quirks only to known Gen6-12 devices. [1] https://edc.intel.com/content/www/us/en/design/publications/14th-generation-core-processors-cfg-and-mem-registers/d2-f0-processor-graphics-registers/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/i915/gem/i915_gem_stolen.c?h=v6.14#n142 Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250505170305.23622-10-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
267 lines
13 KiB
Text
267 lines
13 KiB
Text
Intel Graphics Device (IGD) assignment with vfio-pci
|
|
====================================================
|
|
|
|
Using vfio-pci, we can passthrough Intel Graphics Device (IGD) to guest, either
|
|
serve as primary and exclusive graphics adapter, or used in combination with an
|
|
emulated primary graphics device, depending on the config and guest driver
|
|
support. However, IGD devices are not "clean" PCI devices, they use extra
|
|
memory regions other than BARs. Special handling is required to make them work
|
|
properly, including:
|
|
|
|
* OpRegion for accessing Virtual BIOS Table (VBT) that contains display output
|
|
information.
|
|
* Data Stolen Memory (DSM) region used as VRAM at early stage (BIOS/UEFI)
|
|
|
|
Certain guest software also depends on following conditions to work:
|
|
(*-Required by)
|
|
|
|
| Condition | Linux | Windows | VBIOS | EFI GOP |
|
|
|---------------------------------------------|-------|---------|-------|---------|
|
|
| #1 IGD has a valid OpRegion containing VBT | * ^1 | * | * | * |
|
|
| #2 VID/DID of LPC bridge at 00:1f.0 matches | | | * | * |
|
|
| #3 IGD is assigned to BDF 00:02.0 | | | * | * |
|
|
| #4 IGD has VGA controller device class | | | * | * |
|
|
| #5 Host's VGA ranges are mapped to IGD | | | * | |
|
|
| #6 Guest has valid VBIOS or UEFI Option ROM | | | * | * |
|
|
|
|
^1 Though i915 driver is able to mock a OpRegion, it is still recommended to
|
|
use the VBT copied from host OpRegion to prevent incorrect configuration.
|
|
|
|
For #1, the "x-igd-opregion=on" option exposes a copy of host IGD OpRegion to
|
|
guest via fw_cfg, where guest firmware can set up guest OpRegion with it.
|
|
|
|
For #2, "x-igd-lpc=on" option copies the IDs of host LPC bridge and host bridge
|
|
to guest. Currently this is only supported on i440fx machines as there is
|
|
already an ICH9 LPC bridge present on q35 machines, overwriting its IDs may
|
|
lead to unexpected behavior.
|
|
|
|
For #3, "addr=2.0" assigns IGD to 00:02.0.
|
|
|
|
For #4, the primary display must be set to IGD in host BIOS.
|
|
|
|
For #5, "x-vga=on" enables guest access to standard VGA IO/MMIO ranges.
|
|
|
|
For #6, ROM either provided via the ROM BAR or romfile= option is needed, this
|
|
Intel document [1] shows how to dump VBIOS to file. For UEFI Option ROM, see
|
|
"Guest firmware" section.
|
|
|
|
QEMU also provides a "Legacy" mode that implicitly enables full functionality
|
|
on IGD, it is automatically enabled when
|
|
* IGD generation is 6 to 9 (Sandy Bridge to Comet Lake)
|
|
* Machine type is i440fx
|
|
* IGD is assigned to guest BDF 00:02.0
|
|
* ROM BAR or romfile is present
|
|
|
|
In "Legacy" mode, QEMU will automatically setup OpRegion, LPC bridge IDs and
|
|
VGA range access, which is equivalent to:
|
|
x-igd-opregion=on,x-igd-lpc=on,x-vga=on
|
|
|
|
By default, "Legacy" mode won't fail, it continues on error. User can set
|
|
"x-igd-legacy-mode=on" to force enabling legacy mode, this also checks if the
|
|
conditions above for legacy mode is met, and if any error occurs, QEMU will
|
|
fail immediately. Users can also set "x-igd-legacy-mode=off" to disable legacy
|
|
mode.
|
|
|
|
In legacy mode, as the guest VGA ranges are assigned to IGD device, all other
|
|
graphics devices should be removed, this can be done using "-nographic" or
|
|
"-vga none" or "-nodefaults", along with adding the device using vfio-pci.
|
|
|
|
For either mode, depending on the host kernel, the i915 driver in the host
|
|
may generate faults and errors upon re-binding to an IGD device after it
|
|
has been assigned to a VM. It's therefore generally recommended to prevent
|
|
such driver binding unless the host driver is known to work well for this.
|
|
There are numerous ways to do this, i915 can be blacklisted on the host,
|
|
the driver_override option can be used to ensure that only vfio-pci can bind
|
|
to the device on the host[2], virsh nodedev-detach can be used to bind the
|
|
device to vfio drivers and then managed='no' set in the VM xml to prevent
|
|
re-binding to i915, etc. Also note that IGD is also typically the primary
|
|
graphics in the host and special options may be required beyond simply
|
|
blacklisting i915 or using pci-stub/vfio-pci to take ownership of IGD as a
|
|
PCI class device. Lower level drivers exist that may still claim the device.
|
|
It may therefore be necessary to use kernel boot options video=vesafb:off or
|
|
video=efifb:off (depending on host BIOS/UEFI) or these can be combined to
|
|
a catch-all, video=vesafb:off,efifb:off. Error messages such as:
|
|
|
|
Failed to mmap 0000:00:02.0 BAR <>. Performance may be slow
|
|
|
|
are a good indicator that such a problem exists. The host files /proc/iomem
|
|
and /proc/ioports are often useful for identifying drivers consuming ranges
|
|
of the device to cause such conflicts.
|
|
|
|
Additionally, IGD device are known to generate small numbers of DMAR faults
|
|
when initially assigned. It is believed that this is simply the IGD attempting
|
|
to access the reserved GTT space after reset, which it no longer has access to
|
|
when accessed from userspace. So long as the DMAR faults are small in number
|
|
and most importantly, not ongoing, these are not an indication of an error.
|
|
|
|
Additionally++, analog VGA output (as opposed to digital outputs like HDMI,
|
|
DVI, or DisplayPort) may be unsupported in some use cases. In the author's
|
|
experience, even DP to VGA adapters can be troublesome while adapters between
|
|
digital formats work well.
|
|
|
|
|
|
Options
|
|
=======
|
|
* x-igd-opregion=[*on*|off]
|
|
Copy host IGD OpRegion and expose it to guest with fw_cfg
|
|
|
|
* x-igd-lpc=[on|*off*]
|
|
Creates a dummy LPC bridge at 00:1f:0 with host VID/DID (i440fx only)
|
|
|
|
* x-igd-legacy-mode=[on|off|*auto*]
|
|
Enable/Disable legacy mode
|
|
|
|
* x-igd-gms=[hex, default 0]
|
|
Overriding DSM region size in GGC register, 0 means uses host value.
|
|
Use this only when the DSM size cannot be changed through the
|
|
'DVMT Pre-Allocated' option in host BIOS.
|
|
|
|
|
|
Examples
|
|
========
|
|
* Adding IGD with automatically legacy mode support
|
|
-device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0
|
|
|
|
* Adding IGD with OpRegion and LPC ID hack, but without VGA ranges
|
|
(For UEFI guests)
|
|
-device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0,x-igd-legacy-mode=off,x-igd-lpc=on,romfile=efi_oprom.rom
|
|
|
|
|
|
Guest firmware
|
|
==============
|
|
Guest firmware is responsible for setting up OpRegion and Base of Data Stolen
|
|
Memory (BDSM) in guest address space. IGD passthrough support imposes two
|
|
fw_cfg requirements on the VM firmware:
|
|
|
|
1) "etc/igd-opregion"
|
|
|
|
This fw_cfg file exposes the OpRegion for the IGD device. A reserved
|
|
region should be created below 4GB (recommended 4KB alignment), sized
|
|
sufficient for the fw_cfg file size, and the content of this file copied
|
|
to it. The dword based address of this reserved memory region must also
|
|
be written to the ASLS register at offset 0xFC on the IGD device. It is
|
|
recommended that firmware should make use of this fw_cfg entry for any
|
|
PCI class VGA device with Intel vendor ID. Multiple of such devices
|
|
within a VM is undefined.
|
|
|
|
2) "etc/igd-bdsm-size"
|
|
|
|
This fw_cfg file contains an 8-byte, little endian integer indicating
|
|
the size of the reserved memory region required for IGD stolen memory.
|
|
Firmware must allocate a reserved memory below 4GB with required 1MB
|
|
alignment equal to this size. Additionally the base address of this
|
|
reserved region must be written to the dword BDSM register in PCI config
|
|
space of the IGD device at offset 0x5C (or 0xC0 for Gen 11+ devices using
|
|
64-bit BDSM). As this support is related to running the IGD ROM, which
|
|
has other dependencies on the device appearing at guest address 00:02.0,
|
|
it's expected that this fw_cfg file is only relevant to a single PCI
|
|
class VGA device with Intel vendor ID, appearing at PCI bus address 00:02.0.
|
|
|
|
Starting from Meteor Lake, IGD devices access stolen memory via its MMIO
|
|
BAR2 (LMEMBAR) and removed the BDSM register in config space. There is
|
|
no need for guest firmware to allocate data stolen memory in guest address
|
|
space and write it to BDSM register. Value of this fw_cfg file is 0 in
|
|
such case.
|
|
|
|
Upstream Seabios has OpRegion and BDSM (pre-Gen11 device only) support.
|
|
However, the support is not accepted by upstream EDK2/OVMF. A recommended
|
|
solution is to create a virtual OpRom with following DXE drivers:
|
|
|
|
* IgdAssignmentDxe: Set up OpRegion and BDSM according to fw_cfg (must)
|
|
* IntelGopDriver: Closed-source Intel GOP driver
|
|
* PlatformGopPolicy: Protocol required by IntelGopDriver
|
|
|
|
IntelGopDriver and PlatformGopPolicy is only required when enabling GOP on IGD.
|
|
|
|
The original IgdAssignmentDxe can be found at [3]. A Intel maintained version
|
|
with PlatformGopPolicy for industrial computing is at [4]. There is also an
|
|
unofficially maintained version with newer Gen11+ device support at [5].
|
|
You need to build them with EDK2.
|
|
|
|
For the IntelGopDriver, Intel never released it to public. You may contact
|
|
Intel support to get one as [4] said, if you are an Intel Premier Support
|
|
customer, or you can try extracting it from your host firmware using
|
|
"UEFI BIOS Updater"[6].
|
|
|
|
Once you got all the required DXE drivers, a Option ROM can be generated with
|
|
EfiRom utility in EDK2, using
|
|
EfiRom -f 0x8086 -i <Device ID of your IGD> -o output.rom \
|
|
-e IgdAssignmentDxe.efi PlatformGOPPolicy.efi IntelGopDriver.efi
|
|
|
|
|
|
Known issues
|
|
============
|
|
When using OVMF as guest firmware, you may encounter the following warning:
|
|
warning: vfio_container_dma_map(0x55fab36ce610, 0x380010000000, 0x108000, 0x7fd336000000) = -22 (Invalid argument)
|
|
|
|
Solution:
|
|
Set the host physical address bits to IOMMU address width using
|
|
-cpu host,host-phys-bits-limit=<IOMMU address width>
|
|
Or in libvirt XML with
|
|
<cpu>
|
|
<maxphysaddr mode='passthrough' limit='<IOMMU address width>'/>
|
|
</cpu>
|
|
The IOMMU address width can be determined with
|
|
echo $(( ((0x$(cat /sys/devices/virtual/iommu/dmar0/intel-iommu/cap) & 0x3F0000) >> 16) + 1 ))
|
|
Refer https://edk2.groups.io/g/devel/topic/patch_v1/102359124 for more details
|
|
|
|
|
|
Memory View
|
|
===========
|
|
IGD has it own address space. To use system RAM as VRAM, a single-level page
|
|
table named Global Graphics Translation Table (GTT) is used for the address
|
|
translation. Each page table entry points a 4KB page. Illustration below shows
|
|
the translation flow on IGD with 64-bit GTT PTEs.
|
|
|
|
(PTE_SIZE == 8) +-------------+---+
|
|
| Address | V | V: Valid Bit
|
|
+-------------+---+
|
|
| ... | |
|
|
IGD:0x01ae9010 0xd740| 0x70ffc000 | 1 | Mem:0x42ba3e010^
|
|
-----------------------> 0xd748| 0x42ba3e000 | 1 +------------------>
|
|
(addr >> 12) * PTE_SIZE 0xd750| 0x42ba3f000 | 1 |
|
|
| ... | |
|
|
+-------------+---+
|
|
^ The address may be remapped by IOMMU
|
|
|
|
The memory region store GTT is called GTT Stolen Memory (GSM) it is located
|
|
right below the Data Stolen Memory (DSM). Accessing this region directly is
|
|
not allowed, any access will immediately freeze the whole system. The only way
|
|
to access it is through the second half of MMIO BAR0.
|
|
|
|
The Data Stolen Memory is reserved by firmware, and acts as the VRAM in pre-OS
|
|
environments. In QEMU, guest firmware (Seabios/OVMF) is responsible for
|
|
reserving a continuous region and program its base address to BDSM register,
|
|
then let VBIOS/GOP driver initializing this region. Illustration below shows
|
|
how DSM is mapped.
|
|
|
|
IGD Addr Space Host Addr Space Guest Addr Space
|
|
+-------------+ +-------------+ +-------------+
|
|
| | | | | |
|
|
| | | | | |
|
|
| | +-------------+ +-------------+
|
|
| | | Data Stolen | | Data Stolen |
|
|
| | | (Guest) | | (Guest) |
|
|
| | +------------>+-------------+<------->+-------------+<--Guest BDSM
|
|
| | | Passthrough | | EPT | | Emulated by QEMU
|
|
DSMSIZE+-------------+ | with IOMMU | | Mapping | | Programmed by guest FW
|
|
| | | | | | |
|
|
| | | | | | |
|
|
0+-------------+--+ | | | |
|
|
| +-------------+ | |
|
|
| | Data Stolen | +-------------+
|
|
| | (Host) |
|
|
+------------>+-------------+<--Host BDSM
|
|
Non- | | "real" one in HW
|
|
Passthrough | | Programmed by host FW
|
|
+-------------+
|
|
|
|
Footnotes
|
|
=========
|
|
[1] https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/1-0/dump-video-bios.html
|
|
[2] # echo "vfio-pci" > /sys/bus/pci/devices/0000:00:02.0/driver_override
|
|
[3] https://web.archive.org/web/20240827012422/https://bugzilla.tianocore.org/show_bug.cgi?id=935
|
|
Tianocore bugzilla was down since Jan 2025 :(
|
|
[4] https://eci.intel.com/docs/3.3/components/kvm-hypervisor.html, Patch 0001-0004
|
|
[5] https://github.com/tomitamoeko/VfioIgdPkg
|
|
[6] https://winraid.level1techs.com/t/tool-guide-news-uefi-bios-updater-ubu/30357
|