qemu/hw/ppc
Nicholas Piggin fb802acdc8 ppc/spapr: Fix RTAS stopped state
This change takes the CPUPPCState 'quiesced' field added for powernv
hardware CPU core controls (used to stop and start cores), and extends
it to spapr to model the "RTAS stopped" state. This prevents the
schedulers attempting to run stopped CPUs unexpectedly, which can cause
hangs and possibly other unexpected behaviour.

The detail of the problematic situation is this:

A KVM spapr guest boots with all secondary CPUs defined to be in the
"RTAS stopped" state. In this state, the CPU is only responsive to the
start-cpu RTAS call. This behaviour is modeled in QEMU with the
start_powered_off feature, which sets ->halted on secondary CPUs at
boot. ->halted=true looks like an idle / sleep / power-save state which
typically is responsive to asynchronous interrupts, but spapr clears
wake-on-interrupt bits in the LPCR SPR. This more-or-less works.

Commit e8291ec16d ("target/ppc: fix timebase register reset state")
recently caused the decrementer to expire sooner at boot, causing a
decrementer exception on secondary CPUs in RTAS stopped state. This
was not a problem on TCG, but KVM limits how a guest can modify LPCR, in
particular it prevents the clearing of wake-on-interrupt bits, and so in
the course of CPU register synchronisation, the LPCR as set by spapr to
model the RTAS stopped state is overwritten with KVM's LPCR value, and
that then causes QEMU's interrupt code to notice the expired decrementer
exception, turn that into an interrupt, and set CPU_INTERRUPT_HARD.

That causes the CPU to be kicked, and the KVM vCPU thread to loop
calling kvm_cpu_exec(). kvm_cpu_exec() calls
kvm_arch_process_async_events(), which on ppc just returns ->halted.
This is still true, so it returns immediately with EXCP_HLT, and the
vCPU never goes to sleep because qemu_wait_io_event() sees
CPU_INTERRUPT_HARD is set. All this while the vCPU holds the bql.  This
causes the boot CPU to eventually lock up when it needs the bql.

So make 'quiesced' represent the "RTAS stopped" state, and have it
explicitly not respond to exceptions (interrupt conditions) rather than
rely on machine register state to model that state. This matches the
powernv quiesced state very well because it essentially turns off the
CPU core via a side-band control unit.

There are still issues with QEMU and KVM idea of LPCR diverging and that
is quite ugly and fragile that should be fixed. spapr should synchronize
its LPCR properly with KVM, and not try to use values that KVM does not
support.

Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
Tested-by: Misbah Anjum N <misanjum@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2025-03-20 14:48:17 +10:00
..
amigaone.c ppc/amigaone: Add #defines for memory map constants 2025-03-11 22:43:32 +10:00
e500-ccsr.h Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
e500.c hw/sd/sdhci: Set reset value of interrupt registers 2025-03-11 20:00:16 +01:00
e500.h hw/ppc: Consolidate e500 initial mapping creation functions 2024-11-04 10:09:36 +10:00
e500plat.c hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
fdt.c target/ppc: Split page size information into a separate allocation 2018-04-27 18:05:22 +10:00
fw_cfg.c hw/ppc: Implement fw_cfg_arch_key_name() 2019-05-23 14:10:31 +02:00
Kconfig ppc/ppc405: Remove boards 2025-03-11 22:40:47 +10:00
mac_newworld.c hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
mac_oldworld.c hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
meson.build ppc/ppc405: Remove boards 2025-03-11 22:40:47 +10:00
mpc8544_guts.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
mpc8544ds.c hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
pef.c system: Move 'exec/confidential-guest-support.h' to system/ 2024-12-20 17:44:56 +01:00
pegasos2.c hw: Centralize handling of -machine dumpdtb option 2025-02-24 15:03:42 +00:00
pnv.c hw/ssi/pnv_spi: Make bus names distinct for each controllers of a socket 2025-03-11 22:43:31 +10:00
pnv_adu.c include/hw/qdev-properties: Remove DEFINE_PROP_END_OF_LIST 2024-12-19 19:36:37 +01:00
pnv_bmc.c ppc/pnv: Add a PNOR address and size sanity checks 2025-03-11 22:43:30 +10:00
pnv_chiptod.c Accel & Exec patch queue 2024-12-21 11:07:00 -05:00
pnv_core.c ppc/spapr: Fix RTAS stopped state 2025-03-20 14:48:17 +10:00
pnv_homer.c ppc/pnv: Make HOMER memory a RAM region 2025-03-11 22:43:30 +10:00
pnv_i2c.c Accel & Exec patch queue 2024-12-21 11:07:00 -05:00
pnv_lpc.c ppc/pnv: Implement LPC FW address space IDSEL 2025-03-11 22:43:30 +10:00
pnv_n1_chiplet.c hw/ppc: Add N1 chiplet model 2024-02-23 23:24:42 +10:00
pnv_nest_pervasive.c ppc/pnv: Add xscom- prefix to pervasive-control region name 2024-11-27 02:49:36 +10:00
pnv_occ.c * Next round of XIVE patches... 2025-03-13 10:29:04 +08:00
pnv_pnor.c Accel & Exec patch queue 2024-12-21 11:07:00 -05:00
pnv_psi.c Accel & Exec patch queue 2024-12-21 11:07:00 -05:00
pnv_sbe.c bulk: Remove pointless QOM casts 2023-06-05 20:48:34 +02:00
pnv_xscom.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
ppc.c target/ppc: fix timebase register reset state 2025-03-11 22:43:32 +10:00
ppc4xx_devs.c include/hw/qdev-properties: Remove DEFINE_PROP_END_OF_LIST 2024-12-19 19:36:37 +01:00
ppc4xx_sdram.c include/hw/qdev-properties: Remove DEFINE_PROP_END_OF_LIST 2024-12-19 19:36:37 +01:00
ppc440.h ppc440: Remove ppc460ex_pcie_init legacy init function 2023-07-07 04:47:49 -03:00
ppc440_bamboo.c hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
ppc440_uc.c Accel & Exec patch queue 2024-12-21 11:07:00 -05:00
ppc_booke.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
ppce500_spin.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
prep.c hw/boards: Do not create unusable default if=sd drives 2025-02-16 14:25:08 +01:00
prep_systemio.c Accel & Exec patch queue 2024-12-21 11:07:00 -05:00
rs6000_mc.c include/hw/qdev-properties: Remove DEFINE_PROP_END_OF_LIST 2024-12-19 19:36:37 +01:00
sam460ex.c hw/ppc/epapr: Do not swap ePAPR magic value 2025-03-11 22:43:32 +10:00
spapr.c spapr: Generate random HASHPKEYR for spapr machines 2025-03-11 22:43:32 +10:00
spapr_caps.c vfio queue: 2025-03-13 10:35:12 +08:00
spapr_cpu_core.c ppc/spapr: Fix RTAS stopped state 2025-03-20 14:48:17 +10:00
spapr_drc.c qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
spapr_events.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
spapr_hcall.c ppc: spapr: Enable 2nd DAWR on Power10 pSeries machine 2025-03-11 22:43:32 +10:00
spapr_iommu.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
spapr_irq.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
spapr_nested.c spapr: nested: Add support for reporting Hostwide state counter 2025-03-11 22:43:32 +10:00
spapr_numa.c spapr: Remove support for NVIDIA V100 GPU with NVLink2 2023-09-18 07:25:28 -03:00
spapr_nvdimm.c include/hw/qdev-properties: Remove DEFINE_PROP_END_OF_LIST 2024-12-19 19:36:37 +01:00
spapr_ovec.c hw/ppc: Constify VMState 2023-12-30 07:38:06 +11:00
spapr_pci.c hw/ppc/spapr_pci: Do not reject VFs created after a PF 2025-02-20 18:23:19 -05:00
spapr_pci_vfio.c hw/ppc/Kconfig: Imply VFIO_PCI 2023-12-19 19:03:38 +01:00
spapr_rng.c Accel & Exec patch queue 2024-12-21 11:07:00 -05:00
spapr_rtas.c ppc/spapr: Fix RTAS stopped state 2025-03-20 14:48:17 +10:00
spapr_rtas_ddw.c spapr/ddw: Implement 64bit query extension 2022-07-06 10:22:37 -03:00
spapr_rtc.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
spapr_tpm_proxy.c Accel & Exec patch queue 2024-12-21 11:07:00 -05:00
spapr_vhyp_mmu.c target/ppc: Unexport some functions from mmu-book3s-v3.h 2024-07-26 09:51:34 +10:00
spapr_vio.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
spapr_vof.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
trace-events ppc/pnv: Begin a more complete ADU LPC model for POWER9/10 2024-07-26 09:21:06 +10:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
virtex_ml507.c hw/ppc/epapr: Do not swap ePAPR magic value 2025-03-11 22:43:32 +10:00
vof.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00