Routines of common.c :
vfio_devices_all_dirty_tracking_started
vfio_devices_all_device_dirty_tracking
vfio_devices_query_dirty_bitmap
vfio_get_dirty_bitmap
are all related to dirty page tracking directly at the container level
or at the container device level. Naming is a bit confusing. We will
propose new names in the following changes.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-27-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Pass-through devices of a VM are not necessarily in the same group and
all groups/address_spaces need to be scanned when the machine is
reset. Commit f16f39c3fc ("Implement PCI hot reset") introduced a VM
reset handler for this purpose. Move it under device.c
Also reintroduce the comment which explained the context and was lost
along the way.
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-26-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Gather all CPR related declarations into "vfio-cpr.h" to reduce exposure
of VFIO internals in "hw/vfio/vfio-common.h". These were introduced in
commit d9fa4223b3 ("vfio: register container for cpr").
Order file list in meson.build while at it.
Cc: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-22-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Move all VFIODevice related routines of "helpers.c" into a new "device.c"
file.
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-21-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
VFIOAddressSpace is a common object used by VFIOContainerBase which is
declared in "hw/vfio/vfio-container-base.h". Move the VFIOAddressSpace
related services into "container-base.c".
While at it, rename :
vfio_get_address_space -> vfio_address_space_get
vfio_put_address_space -> vfio_address_space_put
to better reflect the namespace these routines belong to.
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-15-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Gather all VFIORegion related declarations and definitions into their
own files to reduce exposure of VFIO internals in "hw/vfio/vfio-common.h".
They were introduced for 'vfio-platform' support in commits
db0da029a1 ("vfio: Generalize region support") and a664477db8
("hw/vfio/pci: Introduce VFIORegion").
To be noted that the 'vfio-platform' devices have been deprecated and
will be removed in QEMU 10.2. Until then, make the declarations
available externally for 'sysbus-fdt.c'.
Cc: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-12-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Gather all VFIOIOMMUFD related declarations introduced by commits
5ee3dc7af7 ("vfio/iommufd: Implement the iommufd backend") and
5b1e96e654 ("vfio/iommufd: Introduce auto domain creation") into
"vfio-iommufd.h". This to reduce exposure of VFIO internals in
"hw/vfio/vfio-common.h".
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20250318095415.670319-10-clg@redhat.com
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-11-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
These routines are migration related. Move their declaration and
implementation under the migration files.
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Reviewed-by: John Levon <john.levon@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-8-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The migration core subsystem makes use of the VFIO migration API to
collect statistics on the number of bytes transferred. These services
are declared in "hw/vfio/vfio-common.h" which also contains VFIO
internal declarations. Move the migration declarations into a new
header file "hw/vfio/vfio-migration.h" to reduce the exposure of VFIO
internals.
While at it, use a 'vfio_migration_' prefix for these services.
To be noted, vfio_migration_add_bytes_transferred() is a VFIO
migration internal service which we will be moved in the subsequent
patches.
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-4-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
An L2 KVM guest fails to boot inside a pSeries LPAR when booted with a
memory more than 128 GB and PCI device passthrough. The L2 guest also
crashes when it is booted with a memory greater than 128 GB and a PCI
device is hotplugged later.
The issue arises from a conditional check for `levels > 1` in
`spapr_tce_create_table()` within L1 KVM. This check is meant to prevent
multi-level TCEs, which are not supported by the PowerVM hypervisor. As
a result, when QEMU makes a `VFIO_IOMMU_SPAPR_TCE_CREATE` ioctl call
with `levels > 1`, it triggers the conditional check and returns
`EINVAL`, causing the guest to crash with the following errors:
2025-03-04T06:36:36.133117Z qemu-system-ppc64: Failed to create a window, ret = -1 (Invalid argument)
2025-03-04T06:36:36.133176Z qemu-system-ppc64: Failed to create SPAPR window: Invalid argument
qemu: hardware error: vfio: DMA mapping failed, unable to continue
Fix this by checking the supported DDW "levels" returned by the
VFIO_IOMMU_SPAPR_TCE_GET_INFO ioctl before attempting the TCE create
ioctl in KVM.
The patch has been tested on KVM guests with memory configurations of up
to 390GB, and 450GB on PowerVM and bare-metal environments respectively.
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250408124042.2695955-3-amachhiw@linux.ibm.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Introduce an Error ** parameter to vfio_spapr_create_window() to enable
structured error reporting. This allows the function to propagate
detailed errors back to callers.
Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250408124042.2695955-2-amachhiw@linux.ibm.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
VFIO uses migration_file_set_error() in a couple of places where an
'Error **' parameter is not provided. In MemoryListener handlers :
vfio_listener_region_add
vfio_listener_log_global_stop
vfio_listener_log_sync
and in callback routines for IOMMU notifiers :
vfio_iommu_map_notify
vfio_iommu_map_dirty_notify
Hopefully, one day, we will be able to extend these callbacks with an
'Error **' parameter and avoid setting the global migration error.
Until then, it seems sensible to clearly identify the use cases, which
are limited, and open code vfio_migration_set_error(). One other
benefit is an improved error reporting when migration is running.
While at it, slightly modify error reporting to only report errors
when migration is not active and not always as is currently done.
Cc: Prasad Pandit <pjp@fedoraproject.org>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Link: https://lore.kernel.org/qemu-devel/20250324123315.637827-1-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
A few functions now end with a label. The next commit will clean them
up.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250407082643.2310002-3-armbru@redhat.com>
[Straightforward conflict with commit 988ad4cceb (hw/loongarch/virt:
Fix cpuslot::cpu set at last in virt_cpu_plug()) resolved]
Coccinelle's indentation of virt_create_plic() results in a long line.
Avoid that by mimicking the old indentation manually.
Don't touch tests/tcg/mips/user/. I'm not sure these files are ours
to make style cleanups on. They might be imported third-party code,
which we should leave as is to not complicate future updates.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250407082643.2310002-2-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Convert the existing includes with sed.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Convert the existing includes with sed.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Convert the existing includes with
sed -i ,exec/memory.h,system/memory.h,g
Move the include within cpu-all.h into a !CONFIG_USER_ONLY block.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The intent behind the x-device-dirty-page-tracking option is twofold:
1) development/testing in the presence of VFs with VF dirty page tracking
2) deliberately choosing platform dirty tracker over the VF one.
Item 2) scenario is useful when VF dirty tracker is not as fast as
IOMMU, or there's some limitations around it (e.g. number of them is
limited; aggregated address space under tracking is limited),
efficiency/scalability (e.g. 1 pagetable in IOMMU dirty tracker to scan
vs N VFs) or just troubleshooting. Given item 2 it is not restricted to
debugging, hence drop the debug parenthesis from the option description.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250311174807.79825-1-joao.m.martins@oracle.com
[ clg: Fixed subject spelling ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The ATI BAR4 quirk is targeting an ioport BAR. Older devices may
have a BAR4 which is not an ioport, causing a segfault here. Test
the BAR type to skip these devices.
Similar to
"8f419c5b: vfio/pci-quirks: Exclude non-ioport BAR from NVIDIA quirk"
Untested, as I don't have the card to test.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2856
Signed-off-by: Vasilis Liaskovitis <vliaskovitis@suse.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250310235833.41026-1-vliaskovitis@suse.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
display.c doesn't rely on target specific definitions,
move it to system_ss[] to build it once.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-8-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-9-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Removing unused "exec/ram_addr.h" header allow to compile
iommufd.c once for all targets.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-6-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-8-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
These files depend on the VFIO symbol in their Kconfig
definition. They don't rely on target specific definitions,
move them to system_ss[] to build them once.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-5-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-7-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Prefer runtime helpers to get target page size.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250305153929.43687-3-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-5-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
<linux/kvm.h> is already included by "system/kvm.h" in the next line.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250307180337.14811-3-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-3-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Both qemu_minrampagesize() and qemu_maxrampagesize() are
related to host memory backends, having the following call
stack:
qemu_minrampagesize()
-> find_min_backend_pagesize()
-> object_dynamic_cast(obj, TYPE_MEMORY_BACKEND)
qemu_maxrampagesize()
-> find_max_backend_pagesize()
-> object_dynamic_cast(obj, TYPE_MEMORY_BACKEND)
Having TYPE_MEMORY_BACKEND defined in "system/hostmem.h":
include/system/hostmem.h:23:#define TYPE_MEMORY_BACKEND "memory-backend"
Move their prototype declaration to "system/hostmem.h".
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-7-philmd@linaro.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-2-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Wire data commonly use BE byte order (including in the existing migration
protocol), use it also for for VFIO device state packets.
This will allow VFIO multifd device state transfer between hosts with
different endianness.
Although currently there is no such use case, it's good to have it now
for completeness.
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/qemu-devel/dcfc04cc1a50655650dbac8398e2742ada84ee39.1741611079.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The KVMGT/GVT-g vGPU also exposes OpRegion. But unlike IGD passthrough,
it only needs the OpRegion quirk. A previous change moved x-igd-opregion
handling to config quirk breaks KVMGT functionality as it brings extra
checks and applied other quirks. Here we check if the device is mdev
(KVMGT) or not (passthrough), and then applies corresponding quirks.
As before, users must manually specify x-igd-opregion=on to enable it
on KVMGT devices. In the future, we may check the VID/DID and enable
OpRegion automatically.
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-11-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The LPC bridge/Host bridge IDs quirk is also not dependent on legacy
mode. Recent Windows driver no longer depends on these IDs, as well as
Linux i915 driver, while UEFI GOP seems still needs them. Make it an
option to allow users enabling and disabling it as needed.
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-10-tomitamoeko@gmail.com
[ clg: - Fixed spelling in vfio_probe_igd_config_quirk() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Both enable OpRegion option (x-igd-opregion) and legacy mode require
setting up OpRegion copy for IGD devices. As the config quirk no longer
depends on legacy mode, we can now handle x-igd-opregion option there
instead of in vfio_realize.
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-9-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
So far, IGD-specific quirks all require enabling legacy mode, which is
toggled by assigning IGD to 00:02.0. However, some quirks, like the BDSM
and GGC register quirks, should be applied to all supported IGD devices.
A new config option, x-igd-legacy-mode=[on|off|auto], is introduced to
control the legacy mode only quirks. The default value is "auto", which
keeps current behavior that enables legacy mode implicitly and continues
on error when all following conditions are met.
* Machine type is i440fx
* IGD device is at guest BDF 00:02.0
If any one of the conditions above is not met, the default behavior is
equivalent to "off", QEMU will fail immediately if any error occurs.
Users can also use "on" to force enabling legacy mode. It checks if all
the conditions above are met and set up legacy mode. QEMU will also fail
immediately on error in this case.
Additionally, the hotplug check in legacy mode is removed as hotplugging
IGD device is never supported, and it will be checked when enabling the
OpRegion quirk.
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-8-tomitamoeko@gmail.com
[ clg: - Changed warn_report() by info_report() in
vfio_probe_igd_config_quirk() as suggested by Alex W.
- Fixed spelling in vfio_probe_igd_config_quirk () ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>