QEMU supports GICv3 Non-maskable Interrupt, adds to support Non-maskable
Interrupt for AST2700.
Reference:
b36a32ead
Signed-off-by: Jamin Lin <jamin_lin@aspeedtech.com>
Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250204060955.3546022-1-jamin_lin@aspeedtech.com
Move CPU TLB related methods to "exec/cputlb.h".
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20241114011310.3615-19-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* Improve performance of s390x PCI passthrough devices with relaxed translation
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmfK3dsRHHRodXRoQHJl
ZGhhdC5jb20ACgkQLtnXdP5wLbUfDQ/8CopnzCKGKFhyM5skrHbhDbUVbul6yV4L
kIOo7N8OlrNcQB90bj+Udy+mUANHjkmSiBa5lJ/78ej4DFS6CxeVgrl1fSEl36xn
GjWDwSUiN8pG1O4YtnDqWVTBieGSzbkQr1jHgpeAnvv08s+TtmudP1T8IznWU2v9
FqD78SdebZ0Kua+ksBgMxwkHd6VMw13vsu6KuT9VBhie40LcDrFOuG8RDz/qo4IO
Yg9s1Bqcy7Wa4+0ldMXS1plSdIqJBtVc/HDTg1QwH994b4Lvr7ffrFZmuVcd2dbE
XKQ5jAMOYZqWdlXszkyd8moYGhmevCkQlALhpnbHixnfakfFYX0wTiJB6oCthFQ0
It0J/ntNsCmJiIHNbPLzsJ1pE5+ureRnGbxVe05n+zfm8MaXL6s4nSdZzHyp8n43
UZQqVzK55Q34K9O0qoUdCdBCjMKS9v5u95jjJo8+nc8sJoeQTssOoiixwB/E4y21
9qSh7CbDjQK4zwuzQ7jKD603zAJH6ivvsHXlMBMXJFBiSMCAoQQ1vyou6yRHswRr
gLHDwiWUx8SX8ckbbJ/+Zo9+T8JBMvC5hNYG8VoAtlTQusG4bHSbKdPNNH0eHsEp
f7RlZPRizkcK3w0Nj+u4kXdnnex3QLLSgnyAYq7zEl6V+mho8KqaBezkO7wQDHZy
+GW5ignQ1Gs=
=CyiZ
-----END PGP SIGNATURE-----
Merge tag 'pull-request-2025-03-07' of https://gitlab.com/thuth/qemu into staging
* Bug fixes and some small improvements for functional tests
* Improve performance of s390x PCI passthrough devices with relaxed translation
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmfK3dsRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbUfDQ/8CopnzCKGKFhyM5skrHbhDbUVbul6yV4L
# kIOo7N8OlrNcQB90bj+Udy+mUANHjkmSiBa5lJ/78ej4DFS6CxeVgrl1fSEl36xn
# GjWDwSUiN8pG1O4YtnDqWVTBieGSzbkQr1jHgpeAnvv08s+TtmudP1T8IznWU2v9
# FqD78SdebZ0Kua+ksBgMxwkHd6VMw13vsu6KuT9VBhie40LcDrFOuG8RDz/qo4IO
# Yg9s1Bqcy7Wa4+0ldMXS1plSdIqJBtVc/HDTg1QwH994b4Lvr7ffrFZmuVcd2dbE
# XKQ5jAMOYZqWdlXszkyd8moYGhmevCkQlALhpnbHixnfakfFYX0wTiJB6oCthFQ0
# It0J/ntNsCmJiIHNbPLzsJ1pE5+ureRnGbxVe05n+zfm8MaXL6s4nSdZzHyp8n43
# UZQqVzK55Q34K9O0qoUdCdBCjMKS9v5u95jjJo8+nc8sJoeQTssOoiixwB/E4y21
# 9qSh7CbDjQK4zwuzQ7jKD603zAJH6ivvsHXlMBMXJFBiSMCAoQQ1vyou6yRHswRr
# gLHDwiWUx8SX8ckbbJ/+Zo9+T8JBMvC5hNYG8VoAtlTQusG4bHSbKdPNNH0eHsEp
# f7RlZPRizkcK3w0Nj+u4kXdnnex3QLLSgnyAYq7zEl6V+mho8KqaBezkO7wQDHZy
# +GW5ignQ1Gs=
# =CyiZ
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 07 Mar 2025 19:51:55 HKT
# gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg: issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg: aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5
* tag 'pull-request-2025-03-07' of https://gitlab.com/thuth/qemu:
s390x/pci: indicate QEMU supports relaxed translation for passthrough
s390x/pci: add support for guests that request direct mapping
MAINTAINERS: Add docs/devel/testing/functional.rst to the functional section
doc: add missing 'Asset' type in function test doc
tests/functional/test_virtio_balloon: Only use KVM for running this test
tests/functional: fix race in virtio balloon test
tests/functional: Increase the timeout of the mips64el_replay test
tests/functional/test_mips_malta: Add a network test via the pcnet NIC
tests/functional: Move the code for testing HTTP downloads to a common function
tests/functional: stop output from zstd command when uncompressing
tests/functional: drop unused 'get_tag' method
tests/functional: skip memaddr tests on 32-bit builds
tests/functional: reduce tuxrun maxmem to work on 32-bit hosts
tests/functional: set 'qemu_bin' as an object level field
tests/functional: remove unused 'bin_prefix' variable
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Use a similar terminology smmu_hash_remove_by_sid_range() as the one
being used for other hash table matching functions since
smmuv3_invalidate_ste() name is not self explanatory, and introduce a
helper that invokes the g_hash_table_foreach_remove.
No functional change intended.
Signed-off-by: JianChunfu <jansef.jian@hj-micro.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20250228031438.3916-1-jansef.jian@hj-micro.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20250204125009.2281315-10-peter.maydell@linaro.org
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20250204125009.2281315-9-peter.maydell@linaro.org
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The reg isn't validated to be a possible register before
it's dereferenced for one case. The mmio space registered
for the gpio device is 4KiB but there aren't that many
registers in the struct.
Cc: qemu-stable@nongnu.org
Fixes: 526dbbe087 ("hw/gpio: Add GPIO model for Nuvoton NPCM7xx")
Signed-off-by: Patrick Venture <venture@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250226024603.493148-1-venture@google.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Specifying this bit in the guest CLP response indicates that the guest
can optionally choose to skip translation and instead use
identity-mapped operations.
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-ID: <20250226210013.238349-3-mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
When receiving a guest mpcifc(4) or mpcifc(6) instruction without the T
bit set, treat this as a request to perform direct mapping instead of
address translation. In order to facilitate this, pin the entirety of
guest memory into the host iommu.
Pinning for the direct mapping case is handled via vfio and its memory
listener. Additionally, ram discard settings are inherited from vfio:
coordinated discards (e.g. virtio-mem) are allowed while uncoordinated
discards (e.g. virtio-balloon) are disabled.
Subsequent guest DMA operations are all expected to be of the format
guest_phys+sdma, allowing them to be used as lookup into the host
iommu table.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-ID: <20250226210013.238349-2-mjrosato@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
- Merge "qemu/clang-tsa.h" within "qemu/compiler.h"
- Various cleanups around accelerators initialization code
(better user/system split)
- Various trivial cleanups in accel/tcg/,
Guard few TCG calls with tcg_enabled()
- Explicit disassemble_info endianness
- Improve dual-endianness support for MicroBlaze
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmfJw08ACgkQ4+MsLN6t
wN70whAAtfcdWtqseFfb6fvDtjflgxN51Ui0iaOECXUA18USKriGy34eBcMYMiM2
+eKgU7+jI6JGE4+burcgWUsPpFFF951/A8+lyIbFgO5yToTDmC+qNe4XfmMAIyXq
uf9Obr2c0Xk9luh4odb+jPAQodw/7G1fKgcCVIJNDCl/xEcPhS9eNpTaHwcVnkWI
K6KrxWXOsqG6+evJBPWYoXtOOyt0+JcwAsJoGhprwtGm3P9+jSVXsgeGsJVyZcna
f32JtjWL754O8XeMkOn4x6rt58VrCIMKI9xT7keDyuhTCq0Zki9RO2nMU2dSw5mN
AfL9hxqUy0Nijnyslg3ugujDfTePsNyLdwwH7n0mnoD72ELi6WnhDsmOThuEB3Rd
4/kdwTJfA/rlWk/GF1tbKW7AvQZokRARtzmL3V0HmGJu57lX+2JuszEdYBkqDEP7
GH1I10B2yANUm+C9y3X8qWOU7Ws433ebJeJoZuyfnbZ9Me+UfRmql/oS+V8ata2i
fArEItpldUFrWRyYLkTbXrh2dgyV9yJTEir/lzOzeAZZzyabTbjf2z9qnh976GGO
1QnDy5QA4f54kDBUZe7JK26TZsHPch7cgqXW6f8tRlJF7A9hxGK8d2TUV/lC3/vx
LUOlWNu03PhiruYmZEcWOsY3Jt9jRCF6lIryrnaJsqnVOVmMUMM=
=3TRh
-----END PGP SIGNATURE-----
Merge tag 'accel-cpus-20250306' of https://github.com/philmd/qemu into staging
Generic CPUs / accelerators patch queue
- Merge "qemu/clang-tsa.h" within "qemu/compiler.h"
- Various cleanups around accelerators initialization code
(better user/system split)
- Various trivial cleanups in accel/tcg/,
Guard few TCG calls with tcg_enabled()
- Explicit disassemble_info endianness
- Improve dual-endianness support for MicroBlaze
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmfJw08ACgkQ4+MsLN6t
# wN70whAAtfcdWtqseFfb6fvDtjflgxN51Ui0iaOECXUA18USKriGy34eBcMYMiM2
# +eKgU7+jI6JGE4+burcgWUsPpFFF951/A8+lyIbFgO5yToTDmC+qNe4XfmMAIyXq
# uf9Obr2c0Xk9luh4odb+jPAQodw/7G1fKgcCVIJNDCl/xEcPhS9eNpTaHwcVnkWI
# K6KrxWXOsqG6+evJBPWYoXtOOyt0+JcwAsJoGhprwtGm3P9+jSVXsgeGsJVyZcna
# f32JtjWL754O8XeMkOn4x6rt58VrCIMKI9xT7keDyuhTCq0Zki9RO2nMU2dSw5mN
# AfL9hxqUy0Nijnyslg3ugujDfTePsNyLdwwH7n0mnoD72ELi6WnhDsmOThuEB3Rd
# 4/kdwTJfA/rlWk/GF1tbKW7AvQZokRARtzmL3V0HmGJu57lX+2JuszEdYBkqDEP7
# GH1I10B2yANUm+C9y3X8qWOU7Ws433ebJeJoZuyfnbZ9Me+UfRmql/oS+V8ata2i
# fArEItpldUFrWRyYLkTbXrh2dgyV9yJTEir/lzOzeAZZzyabTbjf2z9qnh976GGO
# 1QnDy5QA4f54kDBUZe7JK26TZsHPch7cgqXW6f8tRlJF7A9hxGK8d2TUV/lC3/vx
# LUOlWNu03PhiruYmZEcWOsY3Jt9jRCF6lIryrnaJsqnVOVmMUMM=
# =3TRh
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 06 Mar 2025 23:46:23 HKT
# gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE
* tag 'accel-cpus-20250306' of https://github.com/philmd/qemu: (54 commits)
include: Poison TARGET_PHYS_ADDR_SPACE_BITS definition
system: Open-code qemu_init_arch_modules() using target_name()
target/i386: Mark WHPX APIC region as little-endian
target/alpha: Do not mix exception flags and FPCR bits
target/riscv: Convert misa_mxl_max using GLib macros
target/riscv: Declare RISCVCPUClass::misa_mxl_max as RISCVMXL
target/xtensa: Finalize config in xtensa_register_core()
target/sparc: Constify SPARCCPUClass::cpu_def
target/i386: Constify X86CPUModel uses
disas: Remove target_words_bigendian() call in initialize_debug_target()
target/xtensa: Set disassemble_info::endian value in disas_set_info()
target/sh4: Set disassemble_info::endian value in disas_set_info()
target/riscv: Set disassemble_info::endian value in disas_set_info()
target/ppc: Set disassemble_info::endian value in disas_set_info()
target/mips: Set disassemble_info::endian value in disas_set_info()
target/microblaze: Set disassemble_info::endian value in disas_set_info
target/arm: Set disassemble_info::endian value in disas_set_info()
target: Set disassemble_info::endian value for big-endian targets
target: Set disassemble_info::endian value for little-endian targets
target/mips: Fix possible MSA int overflow
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
OpenRISC timer is architecturally tied to the CPU.
It doesn't belong to the machine init() code to
instanciate it: move its creation when a vCPU is
realized (after being created).
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250114231304.77150-1-philmd@linaro.org>
Directly call cpu_set_pc() instead of open-coding it.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250122093028.52416-2-philmd@linaro.org>
Slighly simplify cpu-target.c again by extracting cpu_exec_initfn()
to cpu-{system,user}.c, adding an empty stub for user emulation.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250123234415.59850-19-philmd@linaro.org>
Rather than maintaining a mix of system / user code for CPU
class properties, move system properties to cpu-system.c
and user ones to the new cpu-user.c unit.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250123234415.59850-18-philmd@linaro.org>
TCGCPUOps structure makes more sense in the accelerator context
rather than hardware emulation. Move it under the accel/tcg/ scope.
Mechanical change doing:
$ sed -i -e 's,hw/core/tcg-cpu-ops.h,accel/tcg/cpu-ops.h,g' \
$(git grep -l hw/core/tcg-cpu-ops.h)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250123234415.59850-11-philmd@linaro.org>
cflags_next_tb is always re-initialized in the CPU Reset()
handler in cpu_common_reset_hold(), no need to initialize
it in cpu_common_initfn().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240427155714.53669-13-philmd@linaro.org>
cpu_common_initfn() is our target agnostic initializer,
while cpu_exec_initfn() is the target specific one.
The %as and %num_ases fields are not target specific,
so initialize them in the common helper.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250123234415.59850-6-philmd@linaro.org>
Cache CPUClass as early as possible, when the instance
is initialized.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250123234415.59850-5-philmd@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250227085601.4140852-7-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Consistently use format "DESCRIPTION (VALUE/VALUE...)".
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250227085601.4140852-6-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
PropertyInfo member @type is externally visible via QMP
device-list-properties and qom-list-properies.
Its meaning is not documented at its definition.
It gets passed as @type argument to object_property_add() and
object_class_property_add(). This argument's documentation isn't of
much help, either:
* @type: the type name of the property. This namespace is pretty loosely
* defined. Sub namespaces are constructed by using a prefix and then
* to angle brackets. For instance, the type 'virtio-net-pci' in the
* 'link' namespace would be 'link<virtio-net-pci>'.
The two QMP commands document it as
# @type: the type of the property. This will typically come in one of
# four forms:
#
# 1) A primitive type such as 'u8', 'u16', 'bool', 'str', or
# 'double'. These types are mapped to the appropriate JSON
# type.
#
# 2) A child type in the form 'child<subtype>' where subtype is a
# qdev device type name. Child properties create the
# composition tree.
#
# 3) A link type in the form 'link<subtype>' where subtype is a
# qdev device type name. Link properties form the device model
# graph.
"Typically come in one of four forms" followed by three items inspires
the level of trust that is appropriate here.
Clean up a bunch of funnies:
* qdev_prop_fdc_drive_type.type is "FdcDriveType". Its .enum_table
refers to QAPI type "FloppyDriveType". So use that.
* qdev_prop_reserved_region is "reserved_region". Its only user is an
array property called "reserved-regions". Its .set() visits str.
So change @type to "str".
* trng_prop_fault_event_set.type is "uint32:bits". Its .set() visits
uint32, so change @type to "uint32". If we believe mentioning it's
actually bits is useful, the proper place would be .description.
* ccw_loadparm.type is "ccw_loadparm". It's users are properties
called "loadparm". Its .set() visits str. So change @type to
"str".
* qdev_prop_nv_gpudirect_clique.type is "uint4". Its set() visits
uint8, so change @type to "uint8". If we believe mentioning the
range is useful, the proper place would be .description.
* s390_pci_fid_propinfo.type is "zpci_fid". Its .set() visits uint32.
So change type to that, and move the "zpci_fid" to .description.
This is admittedly a lousy description, but it's still an
improvement; for instance, output of -device zpci,help changes from
fid=<zpci_fid>
to
fid=<uint32> - zpci_fid
* Similarly for a raft of PropertyInfo in target/riscv/cpu.c.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250227085601.4140852-5-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[Commit message typo fixed]
PropertyInfo member @name becomes ObjectProperty member @type, while
Property member @name becomes ObjectProperty member @name. Rename the
former.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250227085601.4140852-4-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[One missed instance of @type fixed]
Properties using qdev_prop_pci_devfn initially accepted a string of
the form "DEV.FN" or "DEV" where DEV and FN are in hexadecimal.
Member @name was "pci-devfn" initially.
Commit b403298adb (qdev: make the non-legacy pci address property
accept an integer) changed them to additionally accept integers: bits
3..7 are DEV, and bits 0..2 are FN. This is inaccessible externally
in device_add so far.
The commit also changed @name to "int32", and set member @legacy-name
to "pci-devfn". Together, this kept QMP command
device-list-properties unaffected: it used @name only when
@legacy_name was null.
Commit 07d09c58db (qmp: Print descriptions of object properties)
quietly dumbed that down to use @name always, and the next commit
18b91a3e082q (qdev: Drop legacy_name from qdev properties) dropped
member @legacy_name. This changed the value of @type reported by QMP
command device-list-properties from "pci-devfn" to "int32".
But "int32" is misleading: device_add actually wants QAPI type "str".
So change @name to that.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250227085601.4140852-3-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250227085601.4140852-2-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
DEFINE_PROP_ON_OFF_AUTO() property isn't runtime-mutable so using it
would mean that the source VM would need to decide upfront at startup
time whether it wants to do a multifd device state transfer at some
point.
Source VM can run for a long time before being migrated so it is
desirable to have a fallback mechanism to the old way of transferring
VFIO device state if it turns to be necessary.
This brings this property to the same mutability level as ordinary
migration parameters, which too can be adjusted at the run time.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/f2f2d66bda477da3e6cb8c0311006cff36e8651d.1741124640.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
This property allows configuring whether to transfer the particular device
state via multifd channels when live migrating that device.
It defaults to AUTO, which means that VFIO device state transfer via
multifd channels is attempted in configurations that otherwise support it.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/d6dbb326e3d53c7104d62c96c9e3dd64e1c7b940.1741124640.git.maciej.szmigiero@oracle.com
[ clg: Added documentation ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Implement the multifd device state transfer via additional per-device
thread inside save_live_complete_precopy_thread handler.
Switch between doing the data transfer in the new handler and doing it
in the old save_state handler depending if VFIO multifd transfer is enabled
or not.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/4d727e2e0435e0022d50004e474077632830e08d.1741124640.git.maciej.szmigiero@oracle.com
[ clg: - Reordered savevm_vfio_handlers
- Updated save_live_complete_precopy* documentation ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Add a thread which loads the VFIO device state buffers that were received
via multifd.
Each VFIO device that has multifd device state transfer enabled has one
such thread, which is created using migration core API
qemu_loadvm_start_load_thread().
Since it's important to finish loading device state transferred via the
main migration channel (via save_live_iterate SaveVMHandler) before
starting loading the data asynchronously transferred via multifd the thread
doing the actual loading of the multifd transferred data is only started
from switchover_start SaveVMHandler.
switchover_start handler is called when MIG_CMD_SWITCHOVER_START
sub-command of QEMU_VM_COMMAND is received via the main migration channel.
This sub-command is only sent after all save_live_iterate data have already
been posted so it is safe to commence loading of the multifd-transferred
device state upon receiving it - loading of save_live_iterate data happens
synchronously in the main migration thread (much like the processing of
MIG_CMD_SWITCHOVER_START) so by the time MIG_CMD_SWITCHOVER_START is
processed all the proceeding data must have already been loaded.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/9abe612d775aaf42e31646796acd2363c723a57a.1741124640.git.maciej.szmigiero@oracle.com
[ clg: - Reordered savevm_vfio_handlers
- Added switchover_start documentation ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The multifd received data needs to be reassembled since device state
packets sent via different multifd channels can arrive out-of-order.
Therefore, each VFIO device state packet carries a header indicating its
position in the stream.
The raw device state data is saved into a VFIOStateBuffer for later
in-order loading into the device.
The last such VFIO device state packet should have
VFIO_DEVICE_STATE_CONFIG_STATE flag set and carry the device config state.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/e3bff515a8d61c582b94b409eb12a45b1a143a69.1741124640.git.maciej.szmigiero@oracle.com
[ clg: - Reordered savevm_vfio_handlers
- Added load_state_buffer documentation ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Add basic types and flags used by VFIO multifd device state transfer
support.
Since we'll be introducing a lot of multifd transfer specific code,
add a new file migration-multifd.c to home it, wired into main VFIO
migration code (migration.c) via migration-multifd.h header file.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/4eedd529e6617f80f3d6a66d7268a0db2bc173fa.1741124640.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
So it can be safety accessed from multiple threads.
This variable type needs to be changed to unsigned long since
32-bit host platforms lack the necessary addition atomics on 64-bit
variables.
Using 32-bit counters on 32-bit host platforms should not be a problem
in practice since they can't realistically address more memory anyway.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/qemu-devel/dc391771d2d9ad0f311994f0cb9e666da564aeaf.1741124640.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
And rename existing load_device_config_state trace event to
load_device_config_state_end for consistency since it is triggered at the
end of loading of the VFIO device config state.
This way both the start and end points of particular device config
loading operation (a long, BQL-serialized operation) are known.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/qemu-devel/1b6c5a2097e64c272eb7e53f9e4cca4b79581b38.1741124640.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
This QEMU_VM_COMMAND sub-command and its switchover_start SaveVMHandler is
used to mark the switchover point in main migration stream.
It can be used to inform the destination that all pre-switchover main
migration stream data has been sent/received so it can start to process
post-switchover data that it might have received via other migration
channels like the multifd ones.
Add also the relevant MigrationState bit stream compatibility property and
its hw_compat entry.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Zhang Chen <zhangckid@gmail.com> # for the COLO part
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/qemu-devel/311be6da85fc7e49a7598684d80aa631778dcbce.1741124640.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
As an outcome of KVM forum 2024 "vfio-platform: live and let die?"
talk, let's deprecate vfio-platform devices.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250305124225.952791-1-eric.auger@redhat.com
[ clg: Fixed spelling in vfio-amd-xgbe section ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
We want the device in the D0 power state going into reset, but the
config write can enable the BARs in the address space, which are
then removed from the address space once we clear the memory enable
bit in the command register. Re-order to clear the command bit
first, so the power state change doesn't enable the BARs.
Cc: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250225215237.3314011-6-alex.williamson@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The pm_cap on the PCIExpressDevice object can be distilled down
to the new instance on the PCIDevice object.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250225215237.3314011-5-alex.williamson@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
This is now redundant to PCIDevice.pm_cap.
Cc: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250225215237.3314011-4-alex.williamson@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Switch callers directly initializing the PCI PM capability with
pci_add_capability() to use pci_pm_init().
Cc: Dmitry Fleytman <dmitry.fleytman@gmail.com>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Stefan Weil <sw@weilnetz.de>
Cc: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Klaus Jensen <its@irrelevant.dk>
Cc: Jesper Devantier <foss@defmacro.it>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250225215237.3314011-3-alex.williamson@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The memory and IO BARs for devices are only accessible in the D0 power
state. In other power states the PCI spec defines that the device
responds to TLPs and messages with an Unsupported Request response.
To approximate this behavior, consider the BARs as unmapped when the
device is not in the D0 power state. This makes the BARs inaccessible
and has the additional bonus for vfio-pci that we don't attempt to DMA
map BARs for devices in a non-D0 power state.
To support this, an interface is added for devices to register the PM
capability, which allows central tracking to enforce valid transitions
and unmap BARs in non-D0 states.
NB. We currently have device models (eepro100 and pcie_pci_bridge)
that register a PM capability but do not set wmask to enable writes to
the power state field. In order to maintain migration compatibility,
this new helper does not manage the wmask to enable guest writes to
initiate a power state change. The contents and write access of the
PM capability are still managed by the caller.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250225215237.3314011-2-alex.williamson@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>