VFIO iommufd devices will need access to ioas_id, devid, and hwpt_id in
new QEMU at realize time, so add them to CPR state. Define CprVFIODevice
as the object which holds the state and is serialized to the vmstate file.
Define accessors to copy state between VFIODevice and CprVFIODevice.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-15-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Define a list of vfio devices in CPR state, in a subsection so that
older QEMU can be live updated to this version. However, new QEMU
will not be live updateable to old QEMU. This is acceptable because
CPR is not yet commonly used, and updates to older versions are unusual.
The contents of each device object will be defined by the vfio subsystem
in a subsequent patch.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-14-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Register a vfio iommufd container and device for CPR, replacing the generic
CPR register call with a more specific iommufd register call. Add a
blocker if the kernel does not support IOMMU_IOAS_CHANGE_PROCESS.
This is mostly boiler plate. The fields to to saved and restored are added
in subsequent patches.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-13-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Define vfio_device_free_name to free the name created by
vfio_device_get_name. A subsequent patch will do more there.
No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-11-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Use IOMMU_IOAS_MAP_FILE when the mapped region is backed by a file.
Such a mapping can be preserved without modification during CPR,
because it depends on the file's address space, which does not change,
rather than on the process's address space, which does change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Define qemu_ram_get_fd_offset, so CPR can map a memory region using
IOMMU_IOAS_MAP_FILE in a subsequent patch.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-8-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Define iommufd_backend_map_file_dma to implement IOMMU_IOAS_MAP_FILE.
This will be called as a substitute for iommufd_backend_map_dma, so
the error conditions for BARs are copied as-is from that function.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-6-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Add the helper function cpr_get_fd_param, to use when preserving
a file descriptor that is opened externally and passed to QEMU.
cpr_get_fd_param returns a descriptor number either from a QEMU
command-line parameter, from a getfd command, or from CPR state.
When a descriptor is passed to new QEMU via SCM_RIGHTS, its number
changes. Hence, during CPR, the command-line parameter is ignored
in new QEMU, and over-ridden by the value found in CPR state.
Similarly, if the descriptor was originally specified by a getfd
command in old QEMU, the fd number is not known outside of QEMU,
and it changes when sent to new QEMU via SCM_RIGHTS. Hence the
user cannot send getfd to new QEMU, but when the user sends a
hotplug command that references the fd, cpr_get_fd_param finds
its value in CPR state.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-5-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
cpr-transfer breaks vfio network connectivity to and from the guest, and
the host system log shows:
irq bypass consumer (token 00000000a03c32e5) registration fails: -16
which is EBUSY. This occurs because KVM descriptors are still open in
the old QEMU process. Close them.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-4-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Save the MSI message area as part of vfio-pci vmstate, and preserve the
interrupt and notifier eventfd's. migrate_incoming loads the MSI data,
then the vfio-pci post_load handler finds the eventfds in CPR state,
rebuilds vector data structures, and attaches the interrupts to the new
KVM instance.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-2-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
When there are multiple VFIO containers, vioc->dma_map is restored
multiple times, this made only first container work and remaining
containers using vioc->dma_map restored by first container.
Fix it by save and restore vioc->dma_map locally. saved_dma_map in
VFIOContainerCPR becomes useless and is removed.
Fixes: 7e9f214113 ("vfio/container: restore DMA vaddr")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/qemu-devel/20250627063332.5173-3-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
- mark s390x runner system tests as allow_fail
- build semihosting once
- add register write support to plugins
- add virtual memory write support to plugins
- add harder memory read/write support to plugins
- add patcher plugin and tests
- re-stock virtio-gpu MAINTAINERS
- fix context init for Venus fences
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAmhk+PkACgkQ+9DbCVqe
KkRMbAgAgs7sufzfJF+408BpYfciArU/XL0ZbfZnvRdayJaw6hIe9q4ySe+w5xVe
jzSnc/eI3ESl7+Id0eUlC4p7faxgjj/+FoAgCcFSx2Dzt6VQ1WRCMFxc5Z+h/P5i
vm5Zf2vr38RHyIC/A8BRJWmmpF0NtXwYDaVyV92r+qYClBi6RyJ6+Ooio0MEVdzl
62/2bGIsf+BzXQOiHHIyki86GS/9svBNMDbCt+4X1cTtoSArsTj9qJ0CSBNM3BFS
/EJ2e6Hbc9p/8lNPal48/P5L/jN3LJDiQvG+PmxH91gYjsAb6kD4GvKvgT/L6Iai
lD1YrL7vzgiPJWveESe3i7PWMolnCw==
=q7Vs
-----END PGP SIGNATURE-----
Merge tag 'pull-10.1-maintainer-june-2025-020725-1' of https://gitlab.com/stsquad/qemu into staging
Maintainer updates for June (gitlab, semihosting, plugins, virtio-gpu)
- mark s390x runner system tests as allow_fail
- build semihosting once
- add register write support to plugins
- add virtual memory write support to plugins
- add harder memory read/write support to plugins
- add patcher plugin and tests
- re-stock virtio-gpu MAINTAINERS
- fix context init for Venus fences
* tag 'pull-10.1-maintainer-june-2025-020725-1' of https://gitlab.com/stsquad/qemu:
virtio-gpu: support context init multiple timeline
MAINTAINERS: add Akihiko and Dmitry as reviewers
MAINTAINERS: add myself to virtio-gpu for Odd Fixes
plugins: Update plugin version and add notes
plugins: Add patcher plugin and test
tests/tcg: Remove copy-pasted notes and from i386 and add x86_64 system tests to tests
plugins: Add memory hardware address read/write API
plugins: Add memory virtual address write API
plugins: Add enforcement of QEMU_PLUGIN_CB flags in register R/W callbacks
plugins: Add register write API
gdbstub: Expose gdb_write_register function to consumers of gdbstub
semihosting/uaccess: Compile once
semihosting/uaccess: Remove uses of target_ulong type
tests/functional: Add PCI hotplug test for aarch64
gitlab: mark s390x-system to allow failures
Conflicts:
tests/functional/meson.build
Context conflict with commit 7bc86ccbb5 ("tests/functional: test
device passthrough on aarch64"), keep both changes to
tests_aarch64_system_thorough[].
Several source comments still refer to docs with the old .txt
extension that were previously converted to reStructuredText.
Update these references to use the correct .rst extensions to
maintain accurate in-tree documentation pointers.
No functional changes.
Related commits:
50f8174c5c (Jul 2021): docs/specs/acpi_nvdimm: Convert to rST
f054eb1c92 (Jul 2021): docs/specs/acpi_pci_hotplug: Convert to rST
912fb3678b (Sep 2023): docs/specs/vmgenid: Convert to rST
bb1cff6ee0 (Sep 2023): docs/specs/ivshmem-spec: Convert to rST
55ff468f78 (Jan 2022): docs: Rename ppc-spapr-hotplug.txt to .rst
Signed-off-by: Sean Wei <me@sean.taipei>
Message-ID: <20250616.qemu.relocated.05@sean.taipei>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
This patch updates the plugin version to gate new APIs and adds notes
describing what has been added.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Rowan Hart <rowanbhart@gmail.com>
Message-ID: <20250624175351.440780-9-rowanbhart@gmail.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250627112512.1880708-13-alex.bennee@linaro.org>
This patch adds functions to the plugins API to allow plugins to read
and write memory via hardware addresses. The functions use the current
address space of the current CPU in order to avoid exposing address
space information to users. A later patch may want to add a function to
permit a specified address space, for example to facilitate
architecture-specific plugins that want to operate on them, for example
reading ARM secure memory.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Rowan Hart <rowanbhart@gmail.com>
Message-ID: <20250624175351.440780-6-rowanbhart@gmail.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250627112512.1880708-10-alex.bennee@linaro.org>
This patch adds functions to the plugins API to allow reading and
writing memory via virtual addresses. These functions only permit doing
so on the current CPU, because there is no way to ensure consistency if
plugins are allowed to read or write to other CPUs that aren't currently
in the context of the plugin.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Rowan Hart <rowanbhart@gmail.com>
Message-ID: <20250624175351.440780-5-rowanbhart@gmail.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250627112512.1880708-9-alex.bennee@linaro.org>
This patch adds functionality to enforce the requested QEMU_PLUGIN_CB_
flags level passed when registering a callback function using the
plugins API. Each time a callback is about to be invoked, a thread-local
variable will be updated with the level that callback requested. Then,
called API functions (in particular, the register read and write API)
will call qemu_plugin_get_cb_flags() to check the level is at least the
level they require.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Rowan Hart <rowanbhart@gmail.com>
Message-ID: <20250624175351.440780-4-rowanbhart@gmail.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250627112512.1880708-8-alex.bennee@linaro.org>
This patch adds a function to the plugins API to allow plugins to write
register contents. It also moves the qemu_plugin_read_register function
so all the register-related functions are grouped together in the file.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Rowan Hart <rowanbhart@gmail.com>
Message-ID: <20250624175351.440780-3-rowanbhart@gmail.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250627112512.1880708-7-alex.bennee@linaro.org>
This patch exposes the gdb_write_register function from
gdbstub/gdbstub.c via the exec/gdbstub.h header file to support use in
plugins to write register contents.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Julian Ganz <neither@nut.email>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Rowan Hart <rowanbhart@gmail.com>
Message-ID: <20250624175351.440780-2-rowanbhart@gmail.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250627112512.1880708-6-alex.bennee@linaro.org>
Since commit cc5e719e2c ("kvm: require KVM_CAP_SIGNAL_MSI"), the single
implementation of its_class_name() no longer returns NULL (it now always
returns a valid char pointer). Hence, update the prototype docstring and
remove the tautological checks that use the its_class_name() returned
value.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20250628195722.977078-2-gustavo.romero@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Introduce the vfio-user "proxy": this is the client code responsible for
sending and receiving vfio-user messages across the control socket.
The new files hw/vfio-user/proxy.[ch] contain some basic plumbing for
managing the proxy; initialize the proxy during realization of the
VFIOUserPCIDevice instance.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-3-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Introduce basic plumbing for vfio-user with CONFIG_VFIO_USER.
We introduce VFIOUserContainer in hw/vfio-user/container.c, which is a
container type for the "IOMMU" type "vfio-iommu-user", and share some
common container code from hw/vfio/container.c.
Add hw/vfio-user/pci.c for instantiating VFIOUserPCIDevice objects,
sharing some common code from hw/vfio/pci.c.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-2-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
This keeps the existence of ->region_fds private to hw/vfio/device.c.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250616101337.3190027-1-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The LGPLv2.1 boiler-plate in rcu.h and rcu_queue.h still
contained the obsolete "51 Franklin Street" postal address.
Replace it with the canonical GNU licenses URL recommended by the FSF:
https://www.gnu.org/licenses/
Signed-off-by: Sean Wei <me@sean.taipei>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250613.qemu.patch.06@sean.taipei>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Some of the GPLv2 boiler-plate still contained the
obsolete "51 Franklin Street" postal address.
Replace it with the canonical GNU licenses URL recommended by the FSF:
https://www.gnu.org/licenses/
Signed-off-by: Sean Wei <me@sean.taipei>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Message-ID: <20250613.qemu.patch.05@sean.taipei>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Some of the GPLv2 boiler-plate still contained the
obsolete "51 Franklin Street" postal address.
Replace it with the canonical GNU licenses URL recommended by the FSF:
https://www.gnu.org/licenses/
Signed-off-by: Sean Wei <me@sean.taipei>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250613.qemu.patch.04@sean.taipei>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Implement the Service-Call Logical Processor (SCLP) event
type Control-Program Identification (CPI) in QEMU. This
event is used to send CPI identifiers from the guest to the
host. The CPI identifiers are: system type, system name,
system level and sysplex name.
System type: operating system of the guest (e.g. "LINUX ").
System name: user configurable name of the guest (e.g. "TESTVM ").
System level: distribution and kernel version, if the system type is Linux
(e.g. 74872343805430528).
Sysplex name: name of the cluster which the guest belongs to (if any)
(e.g. "PLEX").
The SCLP event CPI is supported only from "s390-ccw-virtio-10.1" machine
and higher.
Signed-off-by: Shalini Chellathurai Saroja <shalini@linux.ibm.com>
Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250616140107.990538-2-shalini@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
- Yanfei's optimization to skip log_clear during completion
- Fabiano's cleanup to remove leftover migration-helpers.c file
- Juraj's vnc fix on display pause after migration
- Jaehoon's cpr test fix on possible race of server establishment
- Chenyi's initial support on vfio enablement for guest-memfd
-----BEGIN PGP SIGNATURE-----
iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCaFmzWhIccGV0ZXJ4QHJl
ZGhhdC5jb20ACgkQO1/MzfOr1wbWYQD/dz08tyaL2J4EHESfBsW4Z1rEggVOM0cB
hlXnvzf/Pb4A/0X3Hn18bOxfPAZOr8NggS5AKgzCCYVeQEWQA2Jj8hwC
=kcTN
-----END PGP SIGNATURE-----
Merge tag 'migration-staging-pull-request' of https://gitlab.com/peterx/qemu into staging
Migration / Memory pull
- Yanfei's optimization to skip log_clear during completion
- Fabiano's cleanup to remove leftover migration-helpers.c file
- Juraj's vnc fix on display pause after migration
- Jaehoon's cpr test fix on possible race of server establishment
- Chenyi's initial support on vfio enablement for guest-memfd
# -----BEGIN PGP SIGNATURE-----
#
# iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCaFmzWhIccGV0ZXJ4QHJl
# ZGhhdC5jb20ACgkQO1/MzfOr1wbWYQD/dz08tyaL2J4EHESfBsW4Z1rEggVOM0cB
# hlXnvzf/Pb4A/0X3Hn18bOxfPAZOr8NggS5AKgzCCYVeQEWQA2Jj8hwC
# =kcTN
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 23 Jun 2025 16:04:42 EDT
# gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706
# gpg: issuer "peterx@redhat.com"
# gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [full]
# gpg: aka "Peter Xu <peterx@redhat.com>" [full]
# Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706
* tag 'migration-staging-pull-request' of https://gitlab.com/peterx/qemu:
physmem: Support coordinated discarding of RAM with guest_memfd
ram-block-attributes: Introduce RamBlockAttributes to manage RAMBlock with guest_memfd
memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard()
memory: Change memory_region_set_ram_discard_manager() to return the result
memory: Export a helper to get intersection of a MemoryRegionSection with a given range
migration: Don't sync volatile memory after migration completes
tests/migration: Setup pre-listened cpr.sock to remove race-condition.
migration: Support fd-based socket address in cpr_transfer_input
ui/vnc: Update display update interval when VM state changes to RUNNING
tests/qtest: Remove migration-helpers.c
migration/ram: avoid to do log clear in the last round
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
A new field, attributes, was introduced in RAMBlock to link to a
RamBlockAttributes object, which centralizes all guest_memfd related
information (such as fd and status bitmap) within a RAMBlock.
Create and initialize the RamBlockAttributes object upon ram_block_add().
Meanwhile, register the object in the target RAMBlock's MemoryRegion.
After that, guest_memfd-backed RAMBlock is associated with the
RamDiscardManager interface, and the users can execute RamDiscardManager
specific handling. For example, VFIO will register the
RamDiscardListener and get notifications when the state_change() helper
invokes.
As coordinate discarding of RAM with guest_memfd is now supported, only
block uncoordinated discard.
Tested-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Link: https://lore.kernel.org/r/20250612082747.51539-6-chenyi.qiang@intel.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
discard") highlighted that subsystems like VFIO may disable RAM block
discard. However, guest_memfd relies on discard operations for page
conversion between private and shared memory, potentially leading to
the stale IOMMU mapping issue when assigning hardware devices to
confidential VMs via shared memory. To address this and allow shared
device assignement, it is crucial to ensure the VFIO system refreshes
its IOMMU mappings.
RamDiscardManager is an existing interface (used by virtio-mem) to
adjust VFIO mappings in relation to VM page assignment. Effectively page
conversion is similar to hot-removing a page in one mode and adding it
back in the other. Therefore, similar actions are required for page
conversion events. Introduce the RamDiscardManager to guest_memfd to
facilitate this process.
Since guest_memfd is not an object, it cannot directly implement the
RamDiscardManager interface. Implementing it in HostMemoryBackend is
not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
have a memory backend while others do not. Notably, virtual BIOS
RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
backend.
To manage RAMBlocks with guest_memfd, define a new object named
RamBlockAttributes to implement the RamDiscardManager interface. This
object can store the guest_memfd information such as the bitmap for
shared memory and the registered listeners for event notifications. A
new state_change() helper function is provided to notify listeners, such
as VFIO, allowing VFIO to do dynamically DMA map and unmap for the shared
memory according to conversion events. Note that in the current context
of RamDiscardManager for guest_memfd, the shared state is analogous to
being populated, while the private state can be considered discarded for
simplicity. In the future, it would be more complicated if considering
more states like private/shared/discarded at the same time.
In current implementation, memory state tracking is performed at the
host page size granularity, as the minimum conversion size can be one
page per request. Additionally, VFIO expected the DMA mapping for a
specific IOVA to be mapped and unmapped with the same granularity.
Confidential VMs may perform partial conversions, such as conversions on
small regions within a larger one. To prevent such invalid cases and
until support for DMA mapping cut operations is available, all
operations are performed with 4K granularity.
In addition, memory conversion failures cause QEMU to quit rather than
resuming the guest or retrying the operation at present. It would be
future work to add more error handling or rollback mechanisms once
conversion failures are allowed. For example, in-place conversion of
guest_memfd could retry the unmap operation during the conversion from
shared to private. For now, keep the complex error handling out of the
picture as it is not required.
Tested-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Link: https://lore.kernel.org/r/20250612082747.51539-5-chenyi.qiang@intel.com
[peterx: squash fixup from Chenyi to fix builds]
Signed-off-by: Peter Xu <peterx@redhat.com>
Update ReplayRamDiscard() function to return the result and unify the
ReplayRamPopulate() and ReplayRamDiscard() to ReplayRamDiscardState() at
the same time due to their identical definitions. This unification
simplifies related structures, such as VirtIOMEMReplayData, which makes
it cleaner.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Link: https://lore.kernel.org/r/20250612082747.51539-4-chenyi.qiang@intel.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Modify memory_region_set_ram_discard_manager() to return -EBUSY if a
RamDiscardManager is already set in the MemoryRegion. The caller must
handle this failure, such as having virtio-mem undo its actions and fail
the realize() process. Opportunistically move the call earlier to avoid
complex error handling.
This change is beneficial when introducing a new RamDiscardManager
instance besides virtio-mem. After
ram_block_coordinated_discard_require(true) unlocks all
RamDiscardManager instances, only one instance is allowed to be set for
one MemoryRegion at present.
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Tested-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Link: https://lore.kernel.org/r/20250612082747.51539-3-chenyi.qiang@intel.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Rename the helper to memory_region_section_intersect_range() to make it
more generic. Meanwhile, define the @end as Int128 and replace the
related operations with Int128_* format since the helper is exported as
a wider API.
Suggested-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Link: https://lore.kernel.org/r/20250612082747.51539-2-chenyi.qiang@intel.com
Signed-off-by: Peter Xu <peterx@redhat.com>
-----BEGIN PGP SIGNATURE-----
iLMEAAEIAB0WIQTKRzxE1qCcGJoZP81FK5aFKyaCFgUCaFPPIgAKCRBFK5aFKyaC
Fq67BACc5EYLK/T0Q2DxDmI84ZmK83GBAsQoB/81CLCDwA4ihotLAjqTB1liKFvY
oFexqza3GBxpiyyw+zAUyrx1X/arnSrCAlgPOF2SIZgj1WnHm/jZb+8mdpgFGZc5
6TkB4Dr5rmSa8OJLYIiC7gKYu9K3zEO/Dprgx9nj3D4tb8xRDQ==
=601X
-----END PGP SIGNATURE-----
Merge tag 'pull-loongarch-20250619' of https://github.com/gaosong715/qemu into staging
pull-loongarch-20250619
# -----BEGIN PGP SIGNATURE-----
#
# iLMEAAEIAB0WIQTKRzxE1qCcGJoZP81FK5aFKyaCFgUCaFPPIgAKCRBFK5aFKyaC
# Fq67BACc5EYLK/T0Q2DxDmI84ZmK83GBAsQoB/81CLCDwA4ihotLAjqTB1liKFvY
# oFexqza3GBxpiyyw+zAUyrx1X/arnSrCAlgPOF2SIZgj1WnHm/jZb+8mdpgFGZc5
# 6TkB4Dr5rmSa8OJLYIiC7gKYu9K3zEO/Dprgx9nj3D4tb8xRDQ==
# =601X
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 19 Jun 2025 04:49:38 EDT
# gpg: using RSA key CA473C44D6A09C189A193FCD452B96852B268216
# gpg: Good signature from "Song Gao <gaosong@loongson.cn>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: CA47 3C44 D6A0 9C18 9A19 3FCD 452B 9685 2B26 8216
* tag 'pull-loongarch-20250619' of https://github.com/gaosong715/qemu:
target/loongarch: fix vldi/xvldi raise wrong error
hw/loongarch/virt: Add kernel irqchip support
hw/loongarch/virt: Disable emulation with IOCSR misc register
target/loongarch: Report error with split kernel_irqchip option
hw/loongarch/virt: Add reset support for kernel irqchip
hw/intc/loongarch_pch: Inject irq line interrupt to kernel
hw/intc/loongarch_pch: Add kernel irqchip save and restore function
hw/intc/loongarch_pch: Add kernel irqchip realize function
hw/intc/loongarch_pch_msi: Inject MSI interrupt to kernel
hw/intc/loongarch_ipi: Add kernel irqchip save and restore function
hw/intc/loongson_ipi: Add load and save interface with ipi_common class
hw/intc/loongarch_ipi: Add kernel irqchip realize function
hw/intc/loongarch_extioi: Add kernel irqchip save and restore function
hw/intc/loongarch_extioi: Add kernel irqchip realize function
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add save and store funtction if kvm_irqchip_in_kernel() return true,
it is to get and set PCH PCI irqchip state from KVM kernel.
Reviewed-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Message-ID: <20250606063033.2557365-9-maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>