qemu/hw
Nicholas Piggin fb802acdc8 ppc/spapr: Fix RTAS stopped state
This change takes the CPUPPCState 'quiesced' field added for powernv
hardware CPU core controls (used to stop and start cores), and extends
it to spapr to model the "RTAS stopped" state. This prevents the
schedulers attempting to run stopped CPUs unexpectedly, which can cause
hangs and possibly other unexpected behaviour.

The detail of the problematic situation is this:

A KVM spapr guest boots with all secondary CPUs defined to be in the
"RTAS stopped" state. In this state, the CPU is only responsive to the
start-cpu RTAS call. This behaviour is modeled in QEMU with the
start_powered_off feature, which sets ->halted on secondary CPUs at
boot. ->halted=true looks like an idle / sleep / power-save state which
typically is responsive to asynchronous interrupts, but spapr clears
wake-on-interrupt bits in the LPCR SPR. This more-or-less works.

Commit e8291ec16d ("target/ppc: fix timebase register reset state")
recently caused the decrementer to expire sooner at boot, causing a
decrementer exception on secondary CPUs in RTAS stopped state. This
was not a problem on TCG, but KVM limits how a guest can modify LPCR, in
particular it prevents the clearing of wake-on-interrupt bits, and so in
the course of CPU register synchronisation, the LPCR as set by spapr to
model the RTAS stopped state is overwritten with KVM's LPCR value, and
that then causes QEMU's interrupt code to notice the expired decrementer
exception, turn that into an interrupt, and set CPU_INTERRUPT_HARD.

That causes the CPU to be kicked, and the KVM vCPU thread to loop
calling kvm_cpu_exec(). kvm_cpu_exec() calls
kvm_arch_process_async_events(), which on ppc just returns ->halted.
This is still true, so it returns immediately with EXCP_HLT, and the
vCPU never goes to sleep because qemu_wait_io_event() sees
CPU_INTERRUPT_HARD is set. All this while the vCPU holds the bql.  This
causes the boot CPU to eventually lock up when it needs the bql.

So make 'quiesced' represent the "RTAS stopped" state, and have it
explicitly not respond to exceptions (interrupt conditions) rather than
rely on machine register state to model that state. This matches the
powernv quiesced state very well because it essentially turns off the
CPU core via a side-band control unit.

There are still issues with QEMU and KVM idea of LPCR diverging and that
is quite ugly and fragile that should be fixed. spapr should synchronize
its LPCR properly with KVM, and not try to use values that KVM does not
support.

Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
Tested-by: Misbah Anjum N <misanjum@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-20 14:48:17 +10:00
..
9pfs 9pfs: improve v9fs_open() tracing 2025-02-06 17:10:46 +01:00
acpi hw/i386/fw_cfg: Check ACPI availability with acpi_builtin() 2025-03-11 20:03:26 +01:00
adc include/hw/qdev-properties: Remove DEFINE_PROP_END_OF_LIST 2024-12-19 19:36:37 +01:00
alpha hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
arm hw/sd/sdhci: Remove need for SDHCI_VENDOR_IMX definition 2025-03-12 11:11:42 +01:00
audio Accel & Exec patch queue 2024-12-21 11:07:00 -05:00
avr hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
block pflash: fix cpr 2025-03-14 09:29:19 -03:00
char Revert "hw/char/pl011: Warn when using disabled receiver" 2025-03-14 12:54:33 +00:00
core hw/loader: fix roms during cpr 2025-03-14 09:29:19 -03:00
cpu hw/cpu/arm_mpcore: Remove default values for GIC external IRQs 2025-02-20 14:20:29 +00:00
cxl mem/cxl_type3: support 3, 6, 12 and 16 interleave ways 2025-02-21 07:18:42 -05:00
display hw/qxl: fix cpr 2025-03-14 09:29:20 -03:00
dma Accel & Exec patch queue 2024-12-21 11:07:00 -05:00
fsi hw: Use device_class_set_legacy_reset() instead of opencoding 2024-09-13 15:31:44 +01:00
gpio hw/gpio: npcm7xx: fixup out-of-bounds access 2025-03-07 10:08:19 +00:00
hppa hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
hyperv qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
i2c hw/arm: Mark Allwinner Technology devices as little-endian 2025-02-16 14:41:46 +01:00
i386 hw/i386/fw_cfg: Check ACPI availability with acpi_builtin() 2025-03-11 20:03:26 +01:00
ide dma: use current AioContext for dma_blk_io() 2025-03-13 17:57:23 +01:00
input Accel & Exec patch queue 2024-12-21 11:07:00 -05:00
intc target/arm: Move arm_current_el() and arm_el_is_aa64() to internals.h 2025-03-14 10:49:20 +00:00
ipack hw/ipack: Remove legacy qemu_allocate_irqs() use 2025-01-31 19:36:44 +01:00
ipmi Accel & Exec patch queue 2024-12-21 11:07:00 -05:00
isa hw: Declare various const data as 'const' 2025-02-16 14:26:07 +01:00
loongarch hw/loongarch/virt: Enable cpu hotplug feature on virt machine 2025-03-05 09:39:18 +08:00
m68k hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
mem mem/cxl_type3: support 3, 6, 12 and 16 interleave ways 2025-02-21 07:18:42 -05:00
microblaze hw/ssi/xilinx_spi: Make device endianness configurable 2025-02-16 14:34:57 +01:00
mips accel/tcg: Rename 'hw/core/tcg-cpu-ops.h' -> 'accel/tcg/cpu-ops.h' 2025-03-06 15:46:17 +01:00
misc hw/misc: use extract64 instead of 1 << i 2025-03-17 08:53:25 +01:00
net hw/net/smc91c111: Don't allow data register access to overrun buffer 2025-03-11 20:03:24 +01:00
nubus include/hw/qdev-properties: Remove DEFINE_PROP_END_OF_LIST 2024-12-19 19:36:37 +01:00
nvme vfio queue: 2025-03-07 07:39:21 +08:00
nvram qdev: Rename PropertyInfo member @name to @type 2025-03-06 10:30:58 +01:00
openrisc target/openrisc: Call cpu_openrisc_clock_init() in cpu_realize() 2025-03-06 15:46:18 +01:00
pci vfio queue: 2025-03-07 07:39:21 +08:00
pci-bridge pcie, virtio: Remove redundant pm_cap 2025-03-06 06:47:33 +01:00
pci-host ppc/pnv/phb4: Add pervasive chiplet support to PHB4/5 2025-03-11 22:43:30 +10:00
ppc ppc/spapr: Fix RTAS stopped state 2025-03-20 14:48:17 +10:00
remote Memory pull request for 10.0 2025-02-19 08:36:26 +08:00
riscv hw/riscv: add IOMMU HPM trace events 2025-03-04 15:42:54 +10:00
rtc hw/rtc: Add Ricoh RS5C372 RTC emulation 2025-03-11 20:01:26 +01:00
rx hw/rx: Allow execution without either bios or kernel 2025-02-16 14:45:38 +01:00
s390x system: Declare qemu_[min/max]rampagesize() in 'system/hostmem.h' 2025-03-11 17:01:14 +01:00
scsi Block layer patches 2025-03-14 09:31:13 +08:00
sd hw/sd/sdhci: Remove need for SDHCI_VENDOR_IMX definition 2025-03-12 11:11:42 +01:00
sensor hw: Make class data 'const' 2025-02-16 14:26:07 +01:00
sh4 exec: Declare tlb_flush*() in 'exec/cputlb.h' 2025-03-08 07:56:14 -08:00
smbios include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
sparc hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
sparc64 hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
ssi hw/ssi/pnv_spi: Put a limit to RDR match failures 2025-03-11 22:43:31 +10:00
timer * qom: Use command line syntax for default values in help 2025-03-03 10:20:59 +08:00
tpm hw/tpm: Have TPM TIS sysbus device inherit from DYNAMIC_SYS_BUS_DEVICE 2025-02-16 14:25:07 +01:00
tricore hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
uefi docs: add uefi variable service documentation 2025-03-04 12:07:05 +01:00
ufs hw/ufs: Add temperature event notification support 2025-03-05 02:13:29 +01:00
usb hw/usb/hcd-xhci-pci: Adds property for disabling mapping in IRQ mode 2025-03-04 14:45:34 +01:00
vfio vfio/pci: Drop debug commentary from x-device-dirty-page-tracking 2025-03-11 19:04:58 +01:00
virtio Block layer patches 2025-03-14 09:31:13 +08:00
vmapple hw/vmapple/vmapple: Add vmapple machine type 2025-03-04 14:45:34 +01:00
watchdog hw/arm: Mark Allwinner Technology devices as little-endian 2025-02-16 14:41:46 +01:00
xen xen: No need to flush the mapcache for grants 2025-03-10 13:28:03 +01:00
xenpv hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
xtensa hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
Kconfig Misc HW patches 2025-03-05 21:54:58 +08:00
meson.build Misc HW patches 2025-03-05 21:54:58 +08:00