qemu/include
Chenyi Qiang 5d6483edaa ram-block-attributes: Introduce RamBlockAttributes to manage RAMBlock with guest_memfd
Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
discard") highlighted that subsystems like VFIO may disable RAM block
discard. However, guest_memfd relies on discard operations for page
conversion between private and shared memory, potentially leading to
the stale IOMMU mapping issue when assigning hardware devices to
confidential VMs via shared memory. To address this and allow shared
device assignement, it is crucial to ensure the VFIO system refreshes
its IOMMU mappings.

RamDiscardManager is an existing interface (used by virtio-mem) to
adjust VFIO mappings in relation to VM page assignment. Effectively page
conversion is similar to hot-removing a page in one mode and adding it
back in the other. Therefore, similar actions are required for page
conversion events. Introduce the RamDiscardManager to guest_memfd to
facilitate this process.

Since guest_memfd is not an object, it cannot directly implement the
RamDiscardManager interface. Implementing it in HostMemoryBackend is
not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
have a memory backend while others do not. Notably, virtual BIOS
RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
backend.

To manage RAMBlocks with guest_memfd, define a new object named
RamBlockAttributes to implement the RamDiscardManager interface. This
object can store the guest_memfd information such as the bitmap for
shared memory and the registered listeners for event notifications. A
new state_change() helper function is provided to notify listeners, such
as VFIO, allowing VFIO to do dynamically DMA map and unmap for the shared
memory according to conversion events. Note that in the current context
of RamDiscardManager for guest_memfd, the shared state is analogous to
being populated, while the private state can be considered discarded for
simplicity. In the future, it would be more complicated if considering
more states like private/shared/discarded at the same time.

In current implementation, memory state tracking is performed at the
host page size granularity, as the minimum conversion size can be one
page per request. Additionally, VFIO expected the DMA mapping for a
specific IOVA to be mapped and unmapped with the same granularity.
Confidential VMs may perform partial conversions, such as conversions on
small regions within a larger one. To prevent such invalid cases and
until support for DMA mapping cut operations is available, all
operations are performed with 4K granularity.

In addition, memory conversion failures cause QEMU to quit rather than
resuming the guest or retrying the operation at present. It would be
future work to add more error handling or rollback mechanisms once
conversion failures are allowed. For example, in-place conversion of
guest_memfd could retry the unmap operation during the conversion from
shared to private. For now, keep the complex error handling out of the
picture as it is not required.

Tested-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Link: https://lore.kernel.org/r/20250612082747.51539-5-chenyi.qiang@intel.com
[peterx: squash fixup from Chenyi to fix builds]
Signed-off-by: Peter Xu <peterx@redhat.com>
2025-06-23 16:03:59 -04:00
..
accel target: Use cpu_pointer_wrap_uint32 for 32-bit targets 2025-05-28 08:08:48 +01:00
authz Prefer 'on' | 'off' over 'yes' | 'no' for bool options 2021-01-29 17:07:53 +00:00
block block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED 2025-06-04 18:16:34 +02:00
chardev chardev/char-hub: implement backend chardev aggregator 2025-02-03 13:57:08 +04:00
crypto crypto: Remove qcrypto_tls_session_get_handshake_status 2025-02-14 15:19:03 -03:00
disas disas: Fix build against Capstone v6 (again) 2024-11-05 10:09:59 +00:00
exec include/exec: fix assert in size_memop 2025-06-07 16:40:44 +01:00
fpu fpu: Move m68k_denormal fmt flag into floatx80_behaviour 2025-02-25 15:32:57 +00:00
gdbstub include/gdbstub: fix include guard in commands.h 2025-06-07 16:40:44 +01:00
hw hw/intc/arm_gic: introduce a first-cpu-index property 2025-06-16 11:26:25 +01:00
io io: Add helper for setting socket send buffer size 2025-05-29 16:37:15 -05:00
libdecnumber Replace config-time define HOST_WORDS_BIGENDIAN 2022-04-06 10:50:37 +02:00
migration migration: lower handler priority 2025-06-11 14:01:58 +02:00
monitor monitor: Remove obsolete stubs 2024-06-30 19:51:44 +03:00
net net: checksum: Convert data to void * 2024-11-25 13:59:50 +08:00
qapi util/error: make func optional 2025-06-05 20:24:51 +02:00
qemu include/qemu/compiler: add QEMU_UNINITIALIZED attribute macro 2025-06-12 13:39:07 -04:00
qobject qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
qom qom: reverse order of instance_post_init calls 2025-05-20 08:18:53 +02:00
scsi hw/ufs: Support for UFS logical unit 2023-09-07 14:01:29 -04:00
semihosting semihosting: Move user-only implementation out-of-line 2025-04-23 14:08:32 -07:00
standard-headers update Linux headers to v6.16-rc3 2025-06-20 13:25:59 +02:00
system ram-block-attributes: Introduce RamBlockAttributes to manage RAMBlock with guest_memfd 2025-06-23 16:03:59 -04:00
tcg tcg: Drop TCGContext.page_{mask,bits} 2025-05-28 08:08:47 +01:00
ui ui/gtk-egl: Render guest content with padding in fixed-scale mode 2025-05-24 17:04:09 +02:00
user accel/tcg: Remove TARGET_PAGE_DATA_SIZE 2025-05-05 09:24:10 -07:00
elf.h util: spelling fixes 2023-08-31 19:47:43 +02:00
glib-compat.h include/glib-compat.h: Poison g_list_sort and g_slist_sort 2025-05-06 16:02:04 +02:00
qemu-io.h Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
qemu-main.h ui & main loop: Redesign of system-specific main thread event handling 2024-12-31 21:21:34 +01:00