Finish CPR by change the owning process of the iommufd device in
post load.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-19-git-send-email-steven.sistare@oracle.com
[ clg: Fixed missing "qemu/error-report.h" include ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reconstruct userland device state after CPR. During vfio_realize, skip all
ioctls that configure the device, as it was already configured in old QEMU.
Skip bind, and use the devid from CPR state.
Skip allocation of, and attachment to, ioas_id. Recover ioas_id from CPR
state, and use it to find a matching container, if any, before creating a
new one.
This reconstruction is not complete. hwpt_id is handled in a subsequent
patch.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-17-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Save the iommu and vfio device fd in CPR state when it is created.
After CPR, the fd number is found in CPR state and reused.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-16-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
VFIO iommufd devices will need access to ioas_id, devid, and hwpt_id in
new QEMU at realize time, so add them to CPR state. Define CprVFIODevice
as the object which holds the state and is serialized to the vmstate file.
Define accessors to copy state between VFIODevice and CprVFIODevice.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-15-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Define a list of vfio devices in CPR state, in a subsection so that
older QEMU can be live updated to this version. However, new QEMU
will not be live updateable to old QEMU. This is acceptable because
CPR is not yet commonly used, and updates to older versions are unusual.
The contents of each device object will be defined by the vfio subsystem
in a subsequent patch.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-14-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Register a vfio iommufd container and device for CPR, replacing the generic
CPR register call with a more specific iommufd register call. Add a
blocker if the kernel does not support IOMMU_IOAS_CHANGE_PROCESS.
This is mostly boiler plate. The fields to to saved and restored are added
in subsequent patches.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-13-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Define vfio_device_free_name to free the name created by
vfio_device_get_name. A subsequent patch will do more there.
No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-11-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
cpr-transfer will use the device name as a key to find the value
of the device descriptor in new QEMU. However, if the descriptor
number is specified by a command-line fd parameter, then
vfio_device_get_name creates a name that includes the fd number.
This causes a chicken-and-egg problem: new QEMU must know the fd
number to construct a name to find the fd number.
To fix, create an invariant name based on the id command-line parameter,
if id is defined. The user will need to provide such an id to use CPR.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-10-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Use IOMMU_IOAS_MAP_FILE when the mapped region is backed by a file.
Such a mapping can be preserved without modification during CPR,
because it depends on the file's address space, which does not change,
rather than on the process's address space, which does change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
cpr-transfer breaks vfio network connectivity to and from the guest, and
the host system log shows:
irq bypass consumer (token 00000000a03c32e5) registration fails: -16
which is EBUSY. This occurs because KVM descriptors are still open in
the old QEMU process. Close them.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-4-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Preserve vfio INTx state across cpr-transfer. Preserve VFIOINTx fields as
follows:
pin : Recover this from the vfio config in kernel space
interrupt : Preserve its eventfd descriptor across exec.
unmask : Ditto
route.irq : This could perhaps be recovered in vfio_pci_post_load by
calling pci_device_route_intx_to_irq(pin), whose implementation reads
config space for a bridge device such as ich9. However, there is no
guarantee that the bridge vmstate is read before vfio vmstate. Rather
than fiddling with MigrationPriority for vmstate handlers, explicitly
save route.irq in vfio vmstate.
pending : save in vfio vmstate.
mmap_timeout, mmap_timer : Re-initialize
bool kvm_accel : Re-initialize
In vfio_realize, defer calling vfio_intx_enable until the vmstate
is available, in vfio_pci_post_load. Modify vfio_intx_enable and
vfio_intx_kvm_enable to skip vfio initialization, but still perform
kvm initialization.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Save the MSI message area as part of vfio-pci vmstate, and preserve the
interrupt and notifier eventfd's. migrate_incoming loads the MSI data,
then the vfio-pci post_load handler finds the eventfds in CPR state,
rebuilds vector data structures, and attaches the interrupts to the new
KVM instance.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-2-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
As the full cpr implementation is yet to be merged upstream, do not register
the vfio-user container with cpr. Full vfio-user support for cpr can be
merged later as a follow-up series.
Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Message-ID: <20250702120043.267634-1-mark.caveayland@nutanix.com>
[ clg: Removed now useless "hw/vfio/vfio-cpr.h" include ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
When there are multiple VFIO containers, vioc->dma_map is restored
multiple times, this made only first container work and remaining
containers using vioc->dma_map restored by first container.
Fix it by save and restore vioc->dma_map locally. saved_dma_map in
VFIOContainerCPR becomes useless and is removed.
Fixes: 7e9f214113 ("vfio/container: restore DMA vaddr")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/qemu-devel/20250627063332.5173-3-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
CPR overrides then restores dma_map in both outgoing and incoming QEMU, for
different reasons. But it only sets saved_dma_map in the target.
Fix it by always setting saved_dma_map.
Fixes: eba1f657cb ("vfio/container: recover from unmap-all-vaddr failure")
Suggested-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/qemu-devel/20250627063332.5173-2-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
- mark s390x runner system tests as allow_fail
- build semihosting once
- add register write support to plugins
- add virtual memory write support to plugins
- add harder memory read/write support to plugins
- add patcher plugin and tests
- re-stock virtio-gpu MAINTAINERS
- fix context init for Venus fences
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAmhk+PkACgkQ+9DbCVqe
KkRMbAgAgs7sufzfJF+408BpYfciArU/XL0ZbfZnvRdayJaw6hIe9q4ySe+w5xVe
jzSnc/eI3ESl7+Id0eUlC4p7faxgjj/+FoAgCcFSx2Dzt6VQ1WRCMFxc5Z+h/P5i
vm5Zf2vr38RHyIC/A8BRJWmmpF0NtXwYDaVyV92r+qYClBi6RyJ6+Ooio0MEVdzl
62/2bGIsf+BzXQOiHHIyki86GS/9svBNMDbCt+4X1cTtoSArsTj9qJ0CSBNM3BFS
/EJ2e6Hbc9p/8lNPal48/P5L/jN3LJDiQvG+PmxH91gYjsAb6kD4GvKvgT/L6Iai
lD1YrL7vzgiPJWveESe3i7PWMolnCw==
=q7Vs
-----END PGP SIGNATURE-----
Merge tag 'pull-10.1-maintainer-june-2025-020725-1' of https://gitlab.com/stsquad/qemu into staging
Maintainer updates for June (gitlab, semihosting, plugins, virtio-gpu)
- mark s390x runner system tests as allow_fail
- build semihosting once
- add register write support to plugins
- add virtual memory write support to plugins
- add harder memory read/write support to plugins
- add patcher plugin and tests
- re-stock virtio-gpu MAINTAINERS
- fix context init for Venus fences
* tag 'pull-10.1-maintainer-june-2025-020725-1' of https://gitlab.com/stsquad/qemu:
virtio-gpu: support context init multiple timeline
MAINTAINERS: add Akihiko and Dmitry as reviewers
MAINTAINERS: add myself to virtio-gpu for Odd Fixes
plugins: Update plugin version and add notes
plugins: Add patcher plugin and test
tests/tcg: Remove copy-pasted notes and from i386 and add x86_64 system tests to tests
plugins: Add memory hardware address read/write API
plugins: Add memory virtual address write API
plugins: Add enforcement of QEMU_PLUGIN_CB flags in register R/W callbacks
plugins: Add register write API
gdbstub: Expose gdb_write_register function to consumers of gdbstub
semihosting/uaccess: Compile once
semihosting/uaccess: Remove uses of target_ulong type
tests/functional: Add PCI hotplug test for aarch64
gitlab: mark s390x-system to allow failures
Conflicts:
tests/functional/meson.build
Context conflict with commit 7bc86ccbb5 ("tests/functional: test
device passthrough on aarch64"), keep both changes to
tests_aarch64_system_thorough[].
After the docs directory restructuring, several comments
refer to paths that no longer exist.
Replace these references to the current file locations
so readers can find the correct files.
Related commits
---------------
189c099f75 (Jul 2021)
docs: collect the disparate device emulation docs into one section
Rename docs/system/{ => devices}/nvme.rst
5f4c96b779 (Feb 2023)
docs/system/loongarch: update loongson3.rst and rename it to virt.rst
Rename docs/system/loongarch/{loongson3.rst => virt.rst}
fe0007f3c1 (Sep 2023)
exec: Rename cpu.c -> cpu-target.c
Rename cpus-common.c => cpu-common.c
42fa9665e5 (Apr 2025)
exec: Restrict 'cpu_ldst.h' to accel/tcg/
Rename include/{exec/cpu_ldst.h => accel/tcg/cpu-ldst.h}
Signed-off-by: Sean Wei <me@sean.taipei>
Message-ID: <20250616.qemu.relocated.06@sean.taipei>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Several source comments still refer to docs with the old .txt
extension that were previously converted to reStructuredText.
Update these references to use the correct .rst extensions to
maintain accurate in-tree documentation pointers.
No functional changes.
Related commits:
50f8174c5c (Jul 2021): docs/specs/acpi_nvdimm: Convert to rST
f054eb1c92 (Jul 2021): docs/specs/acpi_pci_hotplug: Convert to rST
912fb3678b (Sep 2023): docs/specs/vmgenid: Convert to rST
bb1cff6ee0 (Sep 2023): docs/specs/ivshmem-spec: Convert to rST
55ff468f78 (Jan 2022): docs: Rename ppc-spapr-hotplug.txt to .rst
Signed-off-by: Sean Wei <me@sean.taipei>
Message-ID: <20250616.qemu.relocated.05@sean.taipei>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Venus and later native contexts have their own fence context along with
multiple timelines within. Fences wtih VIRTIO_GPU_FLAG_INFO_RING_IDX in
the flags must be dispatched to be created on the target context. Fence
signaling also has to be handled on the specific timeline within that
target context.
Before this change, venus fencing is completely broken if the host
driver doesn't support implicit fencing with external memory objects.
Frames can go backwards along with random artifacts on screen if the
host driver doesn't attach an implicit fence to the render target. The
symptom could be hidden by certain guest wsi backend that waits on a
venus native VkFence object for the actual payload with limited present
modes or under special configs. e.g. x11 mailbox or xwayland.
After this change, everything related to venus fencing starts making
sense. Confirmed this via guest and host side perfetto tracing.
Cc: qemu-stable@nongnu.org
Fixes: 94d0ea1c19 ("virtio-gpu: Support Venus context")
Signed-off-by: Yiwei Zhang <zzyiwei@gmail.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Message-Id: <20250518152651.334115-1-zzyiwei@gmail.com>
[AJB: remove version history from commit message]
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Message-ID: <20250627112512.1880708-16-alex.bennee@linaro.org>
Define RAMLIMIT_BYTES using the TiB definition and display
the error parsed with size_to_str():
$ qemu-system-aarch64-unsigned -M sbsa-ref -m 9T
qemu-system-aarch64-unsigned: sbsa-ref: cannot model more than 8 TiB of RAM
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250623121845.7214-22-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
QDev uses _post_init() during instance creation, before being
realized. Since here both vCPUs and GIC are REALIZED, rename
as virt_post_cpus_gic_realized() for clarity.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250623121845.7214-21-philmd@linaro.org
[PMM: also fixed up comment]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Currently only the TCG and qtest accelerators can handle an EL2
guest. Instead of making the condition check be "fail if KVM or HVF"
(an exclude-list), make it a be "allow if TCG or qtest" (an
accept-list).
This is better for if/when we add new accelerators, as it makes the
default be that we forbid an EL2 guest. This is the most likely to
be correct and also "fails safe"; if the new accelerator really can
support EL2 guests then the implementor will see that they need to
add it to the accept-list.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20250623121845.7214-20-philmd@linaro.org
[PMM: rewrote commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Currently only the TCG and qtest accelerators can handle an EL3
guest. Instead of making the condition check be "fail if KVM or HVF"
(an exclude-list), make it a be "allow if TCG or qtest" (an
accept-list).
This is better for if/when we add new accelerators, as it makes the
default be that we forbid an EL3 guest. This is the most likely to
be correct and also "fails safe"; if the new accelerator really can
support EL3 guests then the implementor will see that they need to
add it to the accept-list.
Reported-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250623121845.7214-19-philmd@linaro.org
[PMM: rewrote commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Currently, the ITS Group nodes in the IORT table and the GIC ITS Struct
in the MADT table are always generated, even if GIC ITS is not available
on the machine.
This commit fixes it by not generating the ITS Group nodes, not mapping
any other node to them, and not advertising the GIC ITS in the MADT
table, when GIC ITS is not available on the machine.
Since the fix changes the MADT and IORT tables, add the blobs for the
"its=off" test to the allow list and update them in the next commit.
This commit also renames the smmu_idmaps and its_idmaps variables in
build_iort() to rc_smmu_idmaps and rc_its_idmaps, respectively, to make
it clearer which nodes are involved in the mappings associated with
these variables.
Reported-by: Udo Steinberg <udo@hypervisor.org>
Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
Message-id: 20250628195722.977078-9-gustavo.romero@linaro.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2886
Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
Co-authored-by: Philippe Mathieu-Daudé <philmd@linaro.org>
[PMM: wrapped an overlong comment]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Factor out a new function, create_its_idmaps(), from the current
build_iort code. Add proper comments to it clarifying how the ID ranges
that go directly to the ITS Group node are computed based on the ones
that are directed to the SMMU node.
Suggested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
Message-id: 20250628195722.977078-6-gustavo.romero@linaro.org
[PMM: drop hardcoded tabs]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When building the Root Complex table, the comment about the code that
maps the RC node to SMMU node is misleading because it reads
"RC -> SMMUv3 -> ITS", but the code is only mapping the RCs IDs to the
SMMUv3 node. The step of mapping from the SMMUv3 IDs to the ITS Group
node is actually defined in another table (in the SMMUv3 node). So
change the comment to read "RC -> SMMUv3" instead.
Signed-off-by Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20250628195722.977078-5-gustavo.romero@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
No need to strstr() check the class name when we can use
kvm_irqchip_in_kernel() to check if the ITS from the host can be used.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Gustavo Romero <gustavo.romero@linaro.org>
Message-id: 20250628195722.977078-4-gustavo.romero@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Because 'tcg_its' in the machine instance is set based on the machine
class’s negated variable 'no_tcg_its', 'tcg_its' is the opposite of
'no_tcg_its' and hence the code in question can be simplified as:
tcg_its = !no_tcg_its.
Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20250628195722.977078-3-gustavo.romero@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Since commit cc5e719e2c ("kvm: require KVM_CAP_SIGNAL_MSI"), the single
implementation of its_class_name() no longer returns NULL (it now always
returns a valid char pointer). Hence, update the prototype docstring and
remove the tautological checks that use the its_class_name() returned
value.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20250628195722.977078-2-gustavo.romero@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-id: 20250617153931.1330449-12-cohuck@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-id: 20250617153931.1330449-11-cohuck@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-id: 20250617153931.1330449-10-cohuck@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-id: 20250617153931.1330449-9-cohuck@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add new message to send multiple writes to server in a single message.
Prevents the outgoing queue from overflowing when a long latency
operation is followed by a series of posted writes.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-18-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Support an asynchronous send of a vfio-user socket message (no wait for
a reply) when the write is posted. This is only safe when no regions are
mappable by the VM. Add an option to explicitly disable this as well.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-17-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
By default, the vfio-user subsystem will wait 5 seconds for a message
reply from the server. Add an option to allow this to be configurable.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-16-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Unlike most other messages, this is a server->client message, for when a
server wants to do "DMA"; this is slow, so normally the server has
memory directly mapped instead.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-15-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
When the vfio-user container gets mapping updates, share them with the
vfio-user by sending a message; this can include the region fd, allowing
the server to directly mmap() the region as needed.
For performance, we only wait for the message responses when we're doing
with a series of updates via the listener_commit() callback.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-14-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Hook this call up to the legacy reset handler for vfio-user-pci.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-13-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The user container will shortly need access to the underlying vfio-user
proxy; set this up.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-12-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
For vfio-user, the server holds the pending IRQ state; set up an I/O
region for the MSI-X PBA so we can ask the server for this state on a
PBA read.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-11-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
IRQ setup uses the same semantics as the traditional vfio path, but we
need to share the corresponding file descriptors with the server as
necessary.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-10-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Re-use PCI setup functions from hw/vfio/pci.c to realize the vfio-user
PCI device.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-9-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-8-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Add support for getting region info for vfio-user. As vfio-user has one
fd per region, enable ->use_region_fds.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-7-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>