Because we handle host IOMMU device creation in each container backend,
we know which type name to use, so hiod_typename property is useless
now, just remove it.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250423072824.3647952-6-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
realize() is now moved after attachment, do the same for hiod creation.
Introduce a new function vfio_device_hiod_create_and_realize() to do
them all in one go.
Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250423072824.3647952-5-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
"hw/vfio/vfio-common.h" has been emptied of most of its declarations
by the previous changes and the only declarations left are related to
VFIODevice. Rename it to "hw/vfio/vfio-device.h" and make the
necessary adjustments.
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-36-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
File "common.c" has been emptied of most of its definitions by the
previous changes and the only definitions left are related to the VFIO
MemoryListener handlers. Rename it to "listener.c" and introduce its
associated "vfio-listener.h" header file for the declarations.
Cleanup a little the includes while at it.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-33-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Rename to vfio_container_query_dirty_bitmap() to be consistent with
the VFIO container routine naming scheme.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-32-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Use the prefix 'vfio_container_devices_' to reflect the routine simply
loops over the container's device list.
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-31-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Also rename vfio_devices_all_device_dirty_tracking_started() while at
it and use the prefix 'vfio_container_devices_' for routines simply
looping over the container's device list.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-30-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
vfio_container_query_dirty_bitmap() is only used in "container-base.c".
Also, rename to vfio_container_iommu_query_dirty_bitmap() to reflect it
is using the VFIO IOMMU backend device ->query_dirty_bitmap() handler.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-29-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
vfio_devices_query_dirty_bitmap() is only used in "container-base.c".
Also, rename to vfio_container_devices_query_dirty_bitmap() to reflect
with the prefix 'vfio_container_devices_' that it simply loops over
the container's device list.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-28-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Routines of common.c :
vfio_devices_all_dirty_tracking_started
vfio_devices_all_device_dirty_tracking
vfio_devices_query_dirty_bitmap
vfio_get_dirty_bitmap
are all related to dirty page tracking directly at the container level
or at the container device level. Naming is a bit confusing. We will
propose new names in the following changes.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-27-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Gather all CPR related declarations into "vfio-cpr.h" to reduce exposure
of VFIO internals in "hw/vfio/vfio-common.h". These were introduced in
commit d9fa4223b3 ("vfio: register container for cpr").
Order file list in meson.build while at it.
Cc: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-22-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
VFIOAddressSpace is a common object used by VFIOContainerBase which is
declared in "hw/vfio/vfio-container-base.h". Move the VFIOAddressSpace
related services into "container-base.c".
While at it, rename :
vfio_get_address_space -> vfio_address_space_get
vfio_put_address_space -> vfio_address_space_put
to better reflect the namespace these routines belong to.
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-15-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Gather all VFIORegion related declarations and definitions into their
own files to reduce exposure of VFIO internals in "hw/vfio/vfio-common.h".
They were introduced for 'vfio-platform' support in commits
db0da029a1 ("vfio: Generalize region support") and a664477db8
("hw/vfio/pci: Introduce VFIORegion").
To be noted that the 'vfio-platform' devices have been deprecated and
will be removed in QEMU 10.2. Until then, make the declarations
available externally for 'sysbus-fdt.c'.
Cc: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-12-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Gather all VFIOIOMMUFD related declarations introduced by commits
5ee3dc7af7 ("vfio/iommufd: Implement the iommufd backend") and
5b1e96e654 ("vfio/iommufd: Introduce auto domain creation") into
"vfio-iommufd.h". This to reduce exposure of VFIO internals in
"hw/vfio/vfio-common.h".
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20250318095415.670319-10-clg@redhat.com
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-11-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
These routines are migration related. Move their declaration and
implementation under the migration files.
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Reviewed-by: John Levon <john.levon@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-8-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The migration core subsystem makes use of the VFIO migration API to
collect statistics on the number of bytes transferred. These services
are declared in "hw/vfio/vfio-common.h" which also contains VFIO
internal declarations. Move the migration declarations into a new
header file "hw/vfio/vfio-migration.h" to reduce the exposure of VFIO
internals.
While at it, use a 'vfio_migration_' prefix for these services.
To be noted, vfio_migration_add_bytes_transferred() is a VFIO
migration internal service which we will be moved in the subsequent
patches.
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-4-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Convert the existing includes with
sed -i ,exec/memory.h,system/memory.h,g
Move the include within cpu-all.h into a !CONFIG_USER_ONLY block.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This property allows configuring whether to transfer the particular device
state via multifd channels when live migrating that device.
It defaults to AUTO, which means that VFIO device state transfer via
multifd channels is attempted in configurations that otherwise support it.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/d6dbb326e3d53c7104d62c96c9e3dd64e1c7b940.1741124640.git.maciej.szmigiero@oracle.com
[ clg: Added documentation ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Implement the multifd device state transfer via additional per-device
thread inside save_live_complete_precopy_thread handler.
Switch between doing the data transfer in the new handler and doing it
in the old save_state handler depending if VFIO multifd transfer is enabled
or not.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/4d727e2e0435e0022d50004e474077632830e08d.1741124640.git.maciej.szmigiero@oracle.com
[ clg: - Reordered savevm_vfio_handlers
- Updated save_live_complete_precopy* documentation ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
This helper will be useful in the listener handlers to extract the
VFIO device from a memory region using memory_region_owner(). At the
moment, we only care for PCI passthrough devices. If the need arises,
we will add more.
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250206131438.1505542-5-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
During DMA unmap with vIOMMU, vfio_devices_all_running_and_mig_active()
is used to check whether a dirty page log sync of the unmapped pages is
required. Such log sync is needed during migration pre-copy phase, and
the current logic detects it by checking if migration is active and if
the VFIO devices are running.
However, recently there has been an effort to simplify the migration
status API and reduce it to a single migration_is_running() function.
To accommodate this, refactor vfio_devices_all_running_and_mig_active()
logic so it won't use migration_is_active(). Do it by simply checking if
dirty tracking has been started using internal VFIO flags.
This should be equivalent to the previous logic as during migration
dirty tracking is active and when the guest is stopped there shouldn't
be DMA unmaps coming from it.
As a side effect, now that migration status is no longer used, DMA unmap
log syncs are untied from migration. This will make calc-dirty-rate more
accurate as now it will also include VFIO dirty pages that were DMA
unmapped.
Also rename the function to properly reflect its new logic and extract
common code from vfio_devices_all_dirty_tracking().
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20241218134022.21264-4-avihaih@nvidia.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
During dirty page log sync, vfio_devices_all_dirty_tracking() is used to
check if dirty tracking has been started in order to avoid errors. The
current logic checks if migration is in ACTIVE or DEVICE states to
ensure dirty tracking has been started.
However, recently there has been an effort to simplify the migration
status API and reduce it to a single migration_is_running() function.
To accommodate this, refactor vfio_devices_all_dirty_tracking() logic so
it won't use migration_is_active() and migration_is_device(). Instead,
use internal VFIO dirty tracking flags.
As a side effect, now that migration status is no longer used to detect
dirty tracking status, VFIO log syncs are untied from migration. This
will make calc-dirty-rate more accurate as now it will also include VFIO
dirty pages.
While at it, as VFIODevice->dirty_tracking is now used to detect dirty
tracking status, add a comment that states how it's protected.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20241218134022.21264-3-avihaih@nvidia.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Add a flag to VFIOContainerBase that indicates whether dirty tracking
has been started for the container or not.
This will be used in the following patches to allow dirty page syncs
only if dirty tracking has been started.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20241218134022.21264-2-avihaih@nvidia.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Headers in include/sysemu/ are not only related to system
*emulation*, they are also used by virtualization. Rename
as system/ which is clearer.
Files renamed manually then mechanical change using sed tool.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20241203172445.28576-1-philmd@linaro.org>
This way both the start and end points of migrating a particular VFIO
device are known.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole
tracking of VF pre-copy phase of dirty page tracking, though it means
that it will only be used at the start of the switchover phase.
Add an option that disables the VF dirty page tracking, and fall
back into container-based dirty page tracking. This also allows to
use IOMMU dirty tracking even on VFs with their own dirty
tracker scheme.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
In preparation to using the dirty tracking UAPI, probe whether the IOMMU
supports dirty tracking. This is done via the data stored in
hiod::caps::hw_caps initialized from GET_HW_INFO.
Qemu doesn't know if VF dirty tracking is supported when allocating
hardware pagetable in iommufd_cdev_autodomains_get(). This is because
VFIODevice migration state hasn't been initialized *yet* hence it can't pick
between VF dirty tracking vs IOMMU dirty tracking. So, if IOMMU supports
dirty tracking it always creates HWPTs with IOMMU_HWPT_ALLOC_DIRTY_TRACKING
even if later on VFIOMigration decides to use VF dirty tracking instead.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[ clg: - Fixed vbasedev->iommu_dirty_tracking assignment in
iommufd_cdev_autodomains_get()
- Added warning for heterogeneous dirty page tracking support
in iommufd_cdev_autodomains_get() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Move the HostIOMMUDevice::realize() to be invoked during the attach of the device
before we allocate IOMMUFD hardware pagetable objects (HWPT). This allows the use
of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell if the IOMMU
behind the device supports dirty tracking.
Note: The HostIOMMUDevice data from legacy backend is static and doesn't
need any information from the (type1-iommu) backend to be initialized.
In contrast however, the IOMMUFD HostIOMMUDevice data requires the
iommufd FD to be connected and having a devid to be able to successfully
GET_HW_INFO. This means vfio_device_hiod_realize() is called in
different places within the backend .attach_device() implementation.
Suggested-by: Cédric Le Goater <clg@redhat.cm>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
[ clg: Fixed error handling in iommufd_cdev_attach() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
There's generally two modes of operation for IOMMUFD:
1) The simple user API which intends to perform relatively simple things
with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches
to VFIO and mainly performs IOAS_MAP and UNMAP.
2) The native IOMMUFD API where you have fine grained control of the
IOMMU domain and model it accordingly. This is where most new feature
are being steered to.
For dirty tracking 2) is required, as it needs to ensure that
the stage-2/parent IOMMU domain will only attach devices
that support dirty tracking (so far it is all homogeneous in x86, likely
not the case for smmuv3). Such invariant on dirty tracking provides a
useful guarantee to VMMs that will refuse incompatible device
attachments for IOMMU domains.
Dirty tracking insurance is enforced via HWPT_ALLOC, which is
responsible for creating an IOMMU domain. This is contrast to the
'simple API' where the IOMMU domain is created by IOMMUFD automatically
when it attaches to VFIO (usually referred as autodomains) but it has
the needed handling for mdevs.
To support dirty tracking with the advanced IOMMUFD API, it needs
similar logic, where IOMMU domains are created and devices attached to
compatible domains. Essentially mimicking kernel
iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain
it falls back to IOAS attach.
The auto domain logic allows different IOMMU domains to be created when
DMA dirty tracking is not desired (and VF can provide it), and others where
it is. Here it is not used in this way given how VFIODevice migration
state is initialized after the device attachment. But such mixed mode of
IOMMU dirty tracking + device dirty tracking is an improvement that can
be added on. Keep the 'all of nothing' of type1 approach that we have
been using so far between container vs device dirty tracking.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
[ clg: Added ERRP_GUARD() in iommufd_cdev_autodomains_get() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>