qemu/include/sysemu
Peter Xu ae4e46cd20 KVM: Dynamic sized kvm memslots array
Zhiyi reported an infinite loop issue in VFIO use case.  The cause of that
was a separate discussion, however during that I found a regression of
dirty sync slowness when profiling.

Each KVMMemoryListerner maintains an array of kvm memslots.  Currently it's
statically allocated to be the max supported by the kernel.  However after
Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"),
the max supported memslots reported now grows to some number large enough
so that it may not be wise to always statically allocate with the max
reported.

What's worse, QEMU kvm code still walks all the allocated memslots entries
to do any form of lookups.  It can drastically slow down all memslot
operations because each of such loop can run over 32K times on the new
kernels.

Fix this issue by making the memslots to be allocated dynamically.

Here the initial size was set to 16 because it should cover the basic VM
usages, so that the hope is the majority VM use case may not even need to
grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default
it'll consume 9 memslots), however not too large to waste memory.

There can also be even better way to address this, but so far this is the
simplest and should be already better even than before we grow the max
supported memslots.  For example, in the case of above issue when VFIO was
attached on a 32GB system, there are only ~10 memslots used.  So it could
be good enough as of now.

In the above VFIO context, measurement shows that the precopy dirty sync
shrinked from ~86ms to ~3ms after this patch applied.  It should also apply
to any KVM enabled VM even without VFIO.

NOTE: we don't have a FIXES tag for this patch because there's no real
commit that regressed this in QEMU. Such behavior existed for a long time,
but only start to be a problem when the kernel reports very large
nr_slots_max value.  However that's pretty common now (the kernel change
was merged in 2021) so we attached cc:stable because we'll want this change
to be backported to stable branches.

Cc: qemu-stable <qemu-stable@nongnu.org>
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Tested-by: Zhiyi Guo <zhguo@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240917163835.194664-2-peterx@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5504a81261)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: context fixup in accel/kvm/kvm-all.c and accel/kvm/trace-events;
 also remove now-unused local variable `KVMState *s` in kvm-all.c:kvm_log_sync_global() )
2024-11-08 13:02:21 +03:00
..
accel-ops.h gdbstub: move guest debug support check to ops 2022-10-06 11:53:41 +01:00
arch_init.h hw/loongarch: Add support loongson3 virt machine type. 2022-06-06 18:09:03 +00:00
balloon.h qapi: Restrict balloon-related commands to machine code 2020-09-29 15:41:35 +02:00
block-backend-common.h block-backend-common.h: split function pointers in BlockDevOps 2022-03-04 18:18:26 +01:00
block-backend-global-state.h block: return errors from bdrv_register_buf() 2022-10-26 14:56:42 -04:00
block-backend-io.h block: Reorganize some declarations in block-backend-io.h 2022-07-12 12:14:56 +02:00
block-backend.h include/sysemu/block-backend: split header into I/O and global state (GS) API 2022-03-04 18:18:25 +01:00
block-ram-registrar.h block: add BlockRAMRegistrar 2022-10-26 14:56:42 -04:00
blockdev.h include/sysemu/blockdev.h: global state API 2022-03-04 18:18:25 +01:00
cpu-throttle.h cpu-throttle: new module, extracted from cpus.c 2020-07-10 18:04:49 -04:00
cpu-timers.h replay: notify vCPU when BH is scheduled 2022-06-06 09:26:53 +02:00
cpus.h gdbstub: move breakpoint logic to accel ops 2022-10-06 11:53:41 +01:00
cryptodev-vhost-user.h cryptodev: Fix Lesser GPL version number 2020-10-27 16:48:49 +01:00
cryptodev-vhost.h cryptodev: Fix Lesser GPL version number 2020-10-27 16:48:49 +01:00
cryptodev.h cryptodev: Add a lkcf-backend for cryptodev 2022-11-02 06:56:32 -04:00
device_tree.h device-tree: add re-randomization helper function 2022-10-27 11:34:31 +01:00
dirtylimit.h softmmu/dirtylimit: Implement virtual CPU throttle 2022-07-20 12:15:08 +01:00
dirtyrate.h migration/dirtyrate: Refactor dirty page rate calculation 2022-07-20 12:15:08 +01:00
dma.h hw/dma: Let dma_buf_read() / dma_buf_write() propagate MemTxResult 2022-01-18 12:56:29 +01:00
dump-arch.h dump: Add architecture section and section string table support 2022-10-24 22:30:10 +04:00
dump.h s390x: pv: Add dump support 2022-10-26 12:54:59 +04:00
event-loop-base.h util/event-loop-base: Introduce options to set the thread pool size 2022-05-09 10:43:23 +01:00
hax.h accel/hax: Introduce CONFIG_HAX_IS_POSSIBLE 2022-03-06 13:15:42 +01:00
hostmem.h hostmem: Allow for specifying a ThreadContext for preallocation 2022-10-27 11:01:03 +02:00
hvf.h include/sysemu: Poison all accelerator CONFIG switches in common code 2021-05-14 12:31:44 +02:00
hvf_int.h arm/hvf: Add a WFI handler 2021-09-21 16:28:26 +01:00
hw_accel.h accel: Introduce AccelOpsClass::cpus_are_resettable() 2022-03-06 13:15:42 +01:00
iothread.h Introduce event-loop-base abstract class 2022-05-09 10:43:23 +01:00
kvm.h kvm: Introduce kvm_arch_get_default_type hook 2023-09-11 10:53:50 +03:00
kvm_int.h KVM: Dynamic sized kvm memslots array 2024-11-08 13:02:21 +03:00
memory_mapping.h sysemu/memory_mapping: Become target-agnostic 2022-03-06 13:15:42 +01:00
numa.h numa: drop support for '-numa node' (without memory specified) 2020-09-30 19:09:20 +02:00
nvmm.h Only check CONFIG_NVMM when NEED_CPU_H is defined 2021-09-13 13:56:26 +02:00
os-posix.h block: move fcntl_setfl() 2022-05-03 15:17:53 +04:00
os-win32.h util/qemu-sockets: Enable unix socket support on Windows 2022-09-02 15:54:46 +04:00
qtest.h cpu-timers, icount: new modules 2020-10-05 16:41:22 +02:00
replay.h chardev: src buffer const for write functions 2022-09-29 14:38:05 +04:00
reset.h reset: allow registering handlers that aren't called by snapshot loading 2022-10-27 11:34:31 +01:00
rng-random.h Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
rng.h qom: Remove module_obj_name parameter from OBJECT_DECLARE* macros 2020-09-18 14:12:32 -04:00
rtc.h rtc: Move RTC function prototypes to their own header 2022-01-28 14:29:46 +00:00
runstate-action.h vl: Add option to avoid stopping VM upon guest panic 2020-12-15 12:51:58 -05:00
runstate.h whpx: Added support for breakpoints and stepping 2022-04-06 14:31:55 +02:00
seccomp.h sandbox: disable -sandbox if CONFIG_SECCOMP undefined 2018-06-01 13:44:15 +02:00
sysemu.h ui/cocoa: Run qemu_init in the main thread 2022-09-23 14:36:33 +02:00
tcg.h accel/tcg: Merge tcg_exec_init into tcg_init_machine 2021-06-11 09:26:28 -07:00
tpm.h sysemu: tpm: Add a stub function for TPM_IS_CRB 2022-05-06 09:06:50 -06:00
tpm_backend.h sysemu: Make TPM structures inaccessible if CONFIG_TPM is not defined 2021-06-15 10:55:12 -04:00
tpm_util.h tpm: Fix Lesser GPL version number 2020-11-15 16:44:18 +01:00
vhost-user-backend.h qom: Remove module_obj_name parameter from OBJECT_DECLARE* macros 2020-09-18 14:12:32 -04:00
watchdog.h watchdog: remove -watchdog option 2022-09-29 11:40:28 +02:00
whpx.h include/sysemu: Poison all accelerator CONFIG switches in common code 2021-05-14 12:31:44 +02:00
xen-mapcache.h include: Make headers more self-contained 2019-08-16 13:31:51 +02:00
xen.h sysemu/xen: Add missing 'exec/cpu-common.h' header for ram_addr_t type 2020-09-30 19:11:36 +02:00