With the default TCP stack configuration, it could be even 2 hours
before the connection times out due to the other side not being
reachable. However, in some cases, the application needs to be aware of
a connection issue much sooner.
This is the case, for example, for postcopy live migration. If there is
no traffic from the migration destination guest (server-side) to the
migration source guest (client-side), the destination keeps waiting for
pages indefinitely and does not switch to the postcopy-paused state.
This can happen, for example, if the destination QEMU instance is
started with the '-S' command line option and the machine is not started
yet, or if the machine is idle and produces no new page faults for
not-yet-migrated pages.
This patch introduces new inet socket parameters that control count,
idle period, and interval of TCP keep-alive packets before the
connection is considered broken. These parameters are available on
systems where the respective TCP socket options are defined, that
includes Linux, Windows, macOS, but not OpenBSD. Additionally, macOS
defines TCP_KEEPIDLE as TCP_KEEPALIVE instead, so the patch supplies its
own definition.
The default value for all is 0, which means the system configuration is
used.
Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Commit aec21d3175 (qapi: Add InetSocketAddress member keep-alive)
introduces the keep-alive flag, which enables the SO_KEEPALIVE socket
option, but only on client-side sockets. However, this option is also
useful for server-side sockets, so they can check if a client is still
reachable or drop the connection otherwise.
This patch enables the SO_KEEPALIVE socket option on passive server-side
sockets if the keep-alive flag is enabled. This socket option is then
inherited by active server-side sockets communicating with connected
clients.
Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
There're 2 QMP commands: query-sgx and query-sgx-capabilities, but
their outputs are very similar and the documentation lacks clear
differentiation.
From the codes, query-sgx is used to gather guest's SGX capabilities
(including SGX related CPUIDs and EPC sections' size, in SGXInfo), and
if guest doesn't have SGX, then QEMU will report the error message.
On the other hand, query-sgx-capabilities is used to gather host's SGX
capabilities (descripted by SGXInfo as well). And if host doesn't
support SGX, then QEMU will also report the error message.
Considering that SGXInfo is already documented and both these 2 commands
have enough error messages (for the exception case in their codes).
Therefore the QAPI documentation for these two commands only needs to
emphasize that one of them applies to the guest and the other to the
host.
Fix their documentation to reflect this difference.
Reported-by: Markus Armbruster <armbru@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/20250513143131.2008078-3-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The "sections" field of SGXInfo is used to gather EPC section
information for both the guest and the host. Therefore, delete the "for
guest" limitation.
Additionally, avoid the abbreviation "info" and use "information"
instead. And for SGXEPCSection, delete the redundant word "info".
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/20250513143131.2008078-2-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QAPI requires strict PascalCase naming style, i.e., only the first
letter of a single word is allowed to be uppercase, which could help
with readability.
Rename SGXInfo to SgxInfo.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250516091130.2374221-3-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QAPI requires strict PascalCase naming style, i.e., only the first
letter of a single word is allowed to be uppercase, which could help
with readability.
Rename SGXEPCSection to SgxEpcSection.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250516091130.2374221-2-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QEMU has an optimization for a just-created drive-mirror destination
that is not possible for blockdev-mirror (which can't create the
destination) - any time we know the destination starts life as all
zeroes, we can skip a pre-zeroing pass on the destination. Recent
patches have added an improved heuristic for detecting if a file
contains all zeroes, and we plan to use that heuristic in upcoming
patches. But since a heuristic cannot quickly detect all scenarios,
and there may be cases where the caller is aware of information that
QEMU cannot learn quickly, it makes sense to have a way to tell QEMU
to assume facts about the destination that can make the mirror
operation faster. Given our existing example of "qemu-img convert
--target-is-zero", it is time to expose this override in QMP for
blockdev-mirror as well.
This patch results in some slight redundancy between the older
s->zero_target (set any time mode==FULL and the destination image was
not just created - ie. clear if drive-mirror is asking to skip the
pre-zero pass) and the newly-introduced s->target_is_zero (in addition
to the QMP override, it is set when drive-mirror creates the
destination image); this will be cleaned up in the next patch.
There is also a subtlety that we must consider. When drive-mirror is
passing target_is_zero on behalf of a just-created image, we know the
image is sparse (skipping the pre-zeroing keeps it that way), so it
doesn't matter whether the destination also has "discard":"unmap" and
"detect-zeroes":"unmap". But now that we are letting the user set the
knob for target-is-zero, if the user passes a pre-existing file that
is fully allocated, it is fine to leave the file fully allocated under
"detect-zeroes":"on", but if the file is open with
"detect-zeroes":"unmap", we should really be trying harder to punch
holes in the destination for every region of zeroes copied from the
source. The easiest way to do this is to still run the pre-zeroing
pass (turning the entire destination file sparse before populating
just the allocated portions of the source), even though that currently
results in double I/O to the portions of the file that are allocated.
A later patch will add further optimizations to reduce redundant
zeroing I/O during the mirror operation.
Since "target-is-zero":true is designed for optimizations, it is okay
to silently ignore the parameter rather than erroring if the user ever
sets the parameter in a scenario where the mirror job can't exploit it
(for example, when doing "sync":"top" instead of "sync":"full", we
can't pre-zero, so setting the parameter won't make a speed
difference).
Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250509204341.3553601-23-eblake@redhat.com>
Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
- deprecate some old block-job- APIs
- on-cbw-error option for backup
- more efficient zero handling in block commit
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEi5wmzbL9FHyIDoahVh8kwfGfefsFAmgiEpwACgkQVh8kwfGf
eftuPw//UWU7MN7Kd8Tc7x/5xJuVOiuOUp8iu78EBtvJy7+yy6lZxDmrVSpob3pI
fiIjZRd0LTO0/hu5nLeTqyGs8cthKNO+hHO1i8xIQuOVC3WqdbCYkiXYUjcHJCeT
ZD2xR2l3F/cjBHXnp7w8K2wuqd4OGjvUpw/JG3mvkDp6uAMJBp+qccAtiCXKLAGv
a4qvFt02TIi7IZYoEyRN+NGuwYvmwrD0TPSbWDzroYsmdZyz93dZniiWkV8elheW
iCDzv4AG1yquAbw6INW3BRWblBYWCLSvrtMVN9XAYf8R+b75bDghUzdHPyFiitsL
aenMMPaNeH1z0jB7oSLrRWx12eCfuRy5UTeil+RQsH9HsGCu7C5yBWkuAyZwlVk1
Qdu3SQ6HGk6BYET0TSRgk/fivmVq14vYxCFWbwclBEuN1HyNxwDJHZE3YxsqGZnM
KM1rByFViOCA+bjw00dFrn18wO8XRWHmRjed8KMAOZvc3jvJUdlr5OR3zfw3RR8l
bpBETylF7d7IpPs6LnxX08SAMBGLYzQe4rvguxjQ/2YB8C9KBkTodygKUYXR3Afw
Wp+vOVmG03XzOdaffuB9VAfyZrE7QmhbdWZTQVBcoqu/oHUbukHboB5p68L3oHXy
0AxHjMyaW5d01JELU0Mlj1+R8e+nK2kTq17v+ghmdX/LyySUyzc=
=tjus
-----END PGP SIGNATURE-----
Merge tag 'pull-block-jobs-2025-04-29-v3' of https://gitlab.com/vsementsov/qemu into staging
block-job patches
- deprecate some old block-job- APIs
- on-cbw-error option for backup
- more efficient zero handling in block commit
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCgAdFiEEi5wmzbL9FHyIDoahVh8kwfGfefsFAmgiEpwACgkQVh8kwfGf
# eftuPw//UWU7MN7Kd8Tc7x/5xJuVOiuOUp8iu78EBtvJy7+yy6lZxDmrVSpob3pI
# fiIjZRd0LTO0/hu5nLeTqyGs8cthKNO+hHO1i8xIQuOVC3WqdbCYkiXYUjcHJCeT
# ZD2xR2l3F/cjBHXnp7w8K2wuqd4OGjvUpw/JG3mvkDp6uAMJBp+qccAtiCXKLAGv
# a4qvFt02TIi7IZYoEyRN+NGuwYvmwrD0TPSbWDzroYsmdZyz93dZniiWkV8elheW
# iCDzv4AG1yquAbw6INW3BRWblBYWCLSvrtMVN9XAYf8R+b75bDghUzdHPyFiitsL
# aenMMPaNeH1z0jB7oSLrRWx12eCfuRy5UTeil+RQsH9HsGCu7C5yBWkuAyZwlVk1
# Qdu3SQ6HGk6BYET0TSRgk/fivmVq14vYxCFWbwclBEuN1HyNxwDJHZE3YxsqGZnM
# KM1rByFViOCA+bjw00dFrn18wO8XRWHmRjed8KMAOZvc3jvJUdlr5OR3zfw3RR8l
# bpBETylF7d7IpPs6LnxX08SAMBGLYzQe4rvguxjQ/2YB8C9KBkTodygKUYXR3Afw
# Wp+vOVmG03XzOdaffuB9VAfyZrE7QmhbdWZTQVBcoqu/oHUbukHboB5p68L3oHXy
# 0AxHjMyaW5d01JELU0Mlj1+R8e+nK2kTq17v+ghmdX/LyySUyzc=
# =tjus
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 12 May 2025 11:24:12 EDT
# gpg: using RSA key 8B9C26CDB2FD147C880E86A1561F24C1F19F79FB
# gpg: Good signature from "Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>" [unknown]
# gpg: aka "Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 8B9C 26CD B2FD 147C 880E 86A1 561F 24C1 F19F 79FB
* tag 'pull-block-jobs-2025-04-29-v3' of https://gitlab.com/vsementsov/qemu:
blockdev-backup: Add error handling option for copy-before-write jobs
qapi/block-core: deprecate some block-job- APIs
qapi: synchronize jobs and block-jobs documentation
block: add test non-active commit with zeroed data
block: allow commit to unmap zero blocks
block: refactor error handling of commit_iteration
block: move commit_run loop to separate function
block: get type of block allocation in commit_run
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch extends the blockdev-backup QMP command to allow users to specify
how to behave when IO errors occur during copy-before-write operations.
Previously, the behavior was fixed and could not be controlled by the user.
The new 'on-cbw-error' option can be set to one of two values:
- 'break-guest-write': Forwards the IO error to the guest and triggers
the on-source-error policy. This preserves snapshot integrity at the
expense of guest IO operations.
- 'break-snapshot': Allows the guest OS to continue running normally,
but invalidates the snapshot and aborts related jobs. This prioritizes
guest operation over backup consistency.
This enhancement provides more flexibility for backup operations in different
environments where requirements for guest availability versus backup
consistency may vary.
The default behavior remains unchanged to maintain backward compatibility.
Signed-off-by: Raman Dzehtsiar <Raman.Dzehtsiar@gmail.com>
Message-ID: <20250414090025.828660-1-Raman.Dzehtsiar@gmail.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
[vsementsov: fix long lines]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
For change, pause, resume, complete, dismiss and finalize actions
corresponding job- and block-job commands are almost equal. The
difference is in find_block_job_locked() vs find_job_locked()
functions. What's different?
1. find_block_job_locked() checks whether the found job is a block-job.
This is OK when moving to more generic API, no needs to document this
change.
2. find_block_job_locked() reports DeviceNotActive on failure, when
find_job_locked() reports GenericError. So, let's document this
difference in deprecated.txt. Still, for dismiss and finalize errors
are not documented at all, so be silent in deprecated.txt as well.
ACKed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20250409084232.28201-3-vsementsov@yandex-team.ru>
Actualize documentation and synchronize it for commands which actually
call the same functions internally.
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20250409084232.28201-2-vsementsov@yandex-team.ru>
The QAPI-generated 'TargetInfo' structure name is only used
in a single file. We want to heavily use another structure
similarly named. Rename the QAPI one, since structure names
are not part of the public API.
Suggested-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20250422145502.70770-2-philmd@linaro.org>
Reduce misc-target.json by one target specific command.
Error message is returned for machines not implementing
TYPE_DUMP_SKEYS_INTERFACE:
$ qemu-system-aarch64 -M virt -S -qmp stdio
{"QMP": {"version": {"qemu": {"micro": 50, "major": 9}}, "capabilities": ["oob"]}}
{ "execute": "qmp_capabilities" }
{"return": {}}
{ "execute": "dump-skeys", "arguments": { "filename": "/tmp/foo" } }
{"error": {"class": "GenericError", "desc": "Storage keys information not available for this architecture"}}
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250310151414.11550-5-philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Cut the crap that stopped making sense years ago. Adjust the
remainder.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250404121413.1743790-10-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The introduction explains example notation. The series merged in
merge commit e6485190f7 (in 9.1) improved how they look in generated
docs, but neglected to update the introduction accordingly. Do that
now.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250404121413.1743790-9-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The command can return any number of RockerOfDpaFlow objects. The
example shows it returning exactly two, with the second object's
members elided. Tweak it so it elides elements after the first
instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250404121413.1743790-3-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[Commit message typo fixed]
This also creates the qapi-qmp-index.html index and cross-reference
target.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-ID: <20250313044312.189276-10-jsnow@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The description of feature @unstable is three paragraphs. The second
and third became part of the description by accident in commit
9fb49daabf (qapi: Mark unstable QMP parts with feature 'unstable').
The second paragraph describes a defect in terms of the
implementation. Fine, but doesn't belong into user-facing
documentation. Turn it into a TODO section.
Rewrite everything else for clarity and completeness.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250311131715.1296101-1-armbru@redhat.com>
Acked-by: Alberto Garcia <berto@igalia.com>
We are not enabling the transmogrifier for QSD or QGA yet because we
don't (yet) have a way to create separate indices, and all of the
definitions will bleed together, which isn't so nice.
For now, QMP is better than nothing at all!
Signed-off-by: John Snow <jsnow@redhat.com>
Message-ID: <20250311034303.75779-62-jsnow@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
- Several iotest fixes
- Refactor QMP for NbdServerOptions for less repetition
- Avoid a hang in 'qemu-nbd --fork' when simple trace backend is enabled
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmfI2I0ACgkQp6FrSiUn
Q2pMWQf/ZNnuLC5O5W4Yiyki7mYjN3Izve5+xgM/gNv9uTrHmYXogYzswpYz1USo
aU1i/EFLJ5K7ImefQYBvySox+opwFs63xPscI/liwrKu54csp1rYGNg3hjKYaZCc
ukR5tB4bRQ2/JXNo0JueiocMZLyC5fZRUt9Z423D3ReHCtEJof0v1rTL2r7mzGJg
EdaRSGr0s592rFjWLwwvWWun5f0mEAFXM2YvZRRLE7M+x2XJeAR4F400GfvSMDNo
chGX41QKhmCYTMI5Shc/mX+5hMllEt/TyYBJUUk0cslJriVnNDV5VTFf184BmH5s
tqiUtzJubxt8/Wtpg6OV9RPAzVmr1g==
=NO+l
-----END PGP SIGNATURE-----
Merge tag 'pull-nbd-2025-03-05' of https://repo.or.cz/qemu/ericb into staging
NBD patches for 2025-03-05
- Several iotest fixes
- Refactor QMP for NbdServerOptions for less repetition
- Avoid a hang in 'qemu-nbd --fork' when simple trace backend is enabled
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmfI2I0ACgkQp6FrSiUn
# Q2pMWQf/ZNnuLC5O5W4Yiyki7mYjN3Izve5+xgM/gNv9uTrHmYXogYzswpYz1USo
# aU1i/EFLJ5K7ImefQYBvySox+opwFs63xPscI/liwrKu54csp1rYGNg3hjKYaZCc
# ukR5tB4bRQ2/JXNo0JueiocMZLyC5fZRUt9Z423D3ReHCtEJof0v1rTL2r7mzGJg
# EdaRSGr0s592rFjWLwwvWWun5f0mEAFXM2YvZRRLE7M+x2XJeAR4F400GfvSMDNo
# chGX41QKhmCYTMI5Shc/mX+5hMllEt/TyYBJUUk0cslJriVnNDV5VTFf184BmH5s
# tqiUtzJubxt8/Wtpg6OV9RPAzVmr1g==
# =NO+l
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 06 Mar 2025 07:04:45 HKT
# gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg: aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* tag 'pull-nbd-2025-03-05' of https://repo.or.cz/qemu/ericb:
nbd: Defer trace init until after daemonization
qapi: merge common parts of NbdServerOptions and nbd-server-start data
iotests: Stop NBD server in test 162 before starting the next one
iotest: Unbreak 302 with python 3.13
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Instead of comment
"Keep this type consistent with the nbd-server-start arguments", we
can simply merge these things.
Note that each field of new base already has "since" tag, equal in both
original copies. So "since" information is saved.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20250219191914.440451-1-vsementsov@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Apple has its own virtio-blk PCI device ID where it deviates from the
official virtio-pci spec slightly: It puts a new "apple type"
field at a static offset in config space and introduces a new barrier
command.
This patch first creates a mechanism for virtio-blk downstream classes to
handle unknown commands. It then creates such a downstream class and a new
vmapple-virtio-blk-pci class which support the additional apple type config
identifier as well as the barrier command.
The 'aux' or 'root' device type are selected using the 'variant' property.
Signed-off-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20241223221645.29911-13-phil@philjordan.eu>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Define qapi schema for the uefi variable store state.
Use it and the generated visitor helper functions to store persistent
(EFI_VARIABLE_NON_VOLATILE) variables in JSON format on disk.
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20250225163031.1409078-15-kraxel@redhat.com>
[ incremental fix squashed in ]
Message-ID: <pji24p6oag7cn2rovus7rquo7q2c6tokuquobfro2sqorky7vu@tk7cxud6jw7f>
Refine documentation for the hub device, specify the maximum.
Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: qemu-devel@nongnu.org
Message-ID: <20250219090607.559887-1-r.peniaev@gmail.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Features:
SR-IOV emulation for pci
virtio-mem-pci support for s390
interleave support for cxl
big endian support for vdpa svq
new QAPI events for vhost-user
Also vIOMMU reset order fixups are in.
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAme4b8sPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpHKcIAKPJsVqPdda2dJ7b7FdyRT0Q+uwezXqaGHd4
7Lzih1wsxYNkwIAyPtEb76/21qiS7BluqlUCfCB66R9xWjP5/KfvAFj4/r4AEduE
fxAgYzotNpv55zcRbcflMyvQ42WGiZZHC+o5Lp7vDXUP3pIyHrl0Ydh5WmcD+hwS
BjXvda58TirQpPJ7rUL+sSfLih17zQkkDcfv5/AgorDy1wK09RBKwMx/gq7wG8yJ
twy8eBY2CmfmFD7eTM+EKqBD2T0kwLEeLfS/F/tl5Fyg6lAiYgYtCbGLpAmWErsg
XZvfZmwqL7CNzWexGvPFnnLyqwC33WUP0k0kT88Y5wh3/h98blw=
=tej8
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pc,pci: features, fixes, cleanups
Features:
SR-IOV emulation for pci
virtio-mem-pci support for s390
interleave support for cxl
big endian support for vdpa svq
new QAPI events for vhost-user
Also vIOMMU reset order fixups are in.
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAme4b8sPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpHKcIAKPJsVqPdda2dJ7b7FdyRT0Q+uwezXqaGHd4
# 7Lzih1wsxYNkwIAyPtEb76/21qiS7BluqlUCfCB66R9xWjP5/KfvAFj4/r4AEduE
# fxAgYzotNpv55zcRbcflMyvQ42WGiZZHC+o5Lp7vDXUP3pIyHrl0Ydh5WmcD+hwS
# BjXvda58TirQpPJ7rUL+sSfLih17zQkkDcfv5/AgorDy1wK09RBKwMx/gq7wG8yJ
# twy8eBY2CmfmFD7eTM+EKqBD2T0kwLEeLfS/F/tl5Fyg6lAiYgYtCbGLpAmWErsg
# XZvfZmwqL7CNzWexGvPFnnLyqwC33WUP0k0kT88Y5wh3/h98blw=
# =tej8
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 21 Feb 2025 20:21:31 HKT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (41 commits)
docs/devel/reset: Document reset expectations for DMA and IOMMU
hw/vfio/common: Add a trace point in vfio_reset_handler
hw/arm/smmuv3: Move reset to exit phase
hw/i386/intel-iommu: Migrate to 3-phase reset
hw/virtio/virtio-iommu: Migrate to 3-phase reset
vhost-user-snd: correct the calculation of config_size
net: vhost-user: add QAPI events to report connection state
hw/virtio/virtio-nsm: Respond with correct length
vdpa: Fix endian bugs in shadow virtqueue
MAINTAINERS: add more files to `vhost`
cryptodev/vhost: allocate CryptoDevBackendVhost using g_mem0()
vhost-iova-tree: Update documentation
vhost-iova-tree, svq: Implement GPA->IOVA & partial IOVA->HVA trees
vhost-iova-tree: Implement an IOVA-only tree
amd_iommu: Use correct bitmask to set capability BAR
amd_iommu: Use correct DTE field for interrupt passthrough
hw/virtio: reset virtio balloon stats on machine reset
mem/cxl_type3: support 3, 6, 12 and 16 interleave ways
hw/mem/cxl_type3: Ensure errp is set on realization failure
hw/mem/cxl_type3: Fix special_ops memory leak on msix_init_exclusive_bar() failure
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The netdev reports NETDEV_VHOST_USER_CONNECTED event when
the chardev is connected, and NETDEV_VHOST_USER_DISCONNECTED
when it is disconnected.
The NETDEV_VHOST_USER_CONNECTED event includes the chardev id.
This allows a system manager like libvirt to detect when the server
fails.
For instance with passt:
{ 'execute': 'qmp_capabilities' }
{ "return": { } }
[killing passt here]
{ "timestamp": { "seconds": 1739538634, "microseconds": 920450 },
"event": "NETDEV_VHOST_USER_DISCONNECTED",
"data": { "netdev-id": "netdev0" } }
[automatic reconnection with reconnect-ms]
{ "timestamp": { "seconds": 1739538638, "microseconds": 354181 },
"event": "NETDEV_VHOST_USER_CONNECTED",
"data": { "netdev-id": "netdev0", "chardev-id": "chr0" } }
Tested-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20250217092550.1172055-1-lvivier@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce the EndianMode type and the DEFINE_PROP_ENDIAN() macros.
Endianness can be BIG, LITTLE or unspecified (default).
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250213122217.62654-2-philmd@linaro.org>
Update the migrate_cancel command documentation with a few words about
postcopy and the expected state of the machine after migration.
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-ID: <20250213175927.19642-10-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Although defaulting the handshake limit to 10 seconds was a nice QoI
change to weed out intentionally slow clients, it can interfere with
integration testing done with manual NBD_OPT commands over 'nbdsh
--opt-mode'. Expose a QMP knob 'handshake-max-secs' to allow the user
to alter the timeout away from the default.
The parameter name here intentionally matches the spelling of the
constant added in commit fb1c2aaa98, and not the command-line spelling
added in the previous patch for qemu-nbd; that's because in QMP,
longer names serve as good self-documentation, and unlike the command
line, machines don't have problems generating longer spellings.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250203222722.650694-6-eblake@redhat.com>
[eblake: s/max-secs/max-seconds/ in QMP]
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
The "special_features" field / parameter holds the subset of schema
features that are for internal code use. Specifically 'DEPRECATED'
and 'UNSTABLE'.
This special casing of internal features is going to be removed, so
prepare for that by renaming to 'features'. Using a fixed size type
is also best practice for bit fields.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250205123550.2754387-3-berrange@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The upper bound of pointer position in InputMoveEvent should be 0x7fff,
according to INPUT_EVENT_ABS_MAX.
Signed-off-by: Zhang Boyang <zhangboyang.id@gmail.com>
Message-ID: <20250116104433.12114-1-zhangboyang.id@gmail.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
[Phrasing tweak squashed in]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
As described in docs/devel/qapi-code-gen.rst line 998,
there should be no space between "Since" and ":".
Signed-off-by: Victor Toso <victortoso@redhat.com>
Message-ID: <20241217091504.16416-1-victortoso@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The general expectation is that header files should follow the same
file/path naming scheme as the corresponding source file. There are
various historical exceptions to this practice in QEMU, with one of
the most notable being the include/qapi/qmp/ directory.
include/qapi/qmp/dispatch.h corresponds mostly to qapi/qmp-registry.c.
Move and rename it to include/qapi/qmp-registry.h.
Now just qerror.h is left in include/qapi/qmp/. Since it's deprecated
& (slowly) getting eliminated anyway, it isn't worth moving.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20241118151235.2665921-3-armbru@redhat.com>
The general expectation is that header files should follow the same
file/path naming scheme as the corresponding source file. There are
various historical exceptions to this practice in QEMU, with one of
the most notable being the include/qapi/qmp/ directory. Most of the
headers there correspond to source files in qobject/.
This patch corrects most of that inconsistency by creating
include/qobject/ and moving the headers for qobject/ there.
This also fixes MAINTAINERS for include/qapi/qmp/dispatch.h:
scripts/get_maintainer.pl now reports "QAPI" instead of "No
maintainers found".
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com> #s390x
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20241118151235.2665921-2-armbru@redhat.com>
[Rebased]
Add an option in BlockExportOptions to allow creating an export on an
inactive node without activating the node. This mode needs to be
explicitly supported by the export type (so that it doesn't perform any
operations that are forbidden for inactive nodes), so this patch alone
doesn't allow this option to be successfully used yet.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250204211407.381505-13-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The system emulator tries to automatically activate and inactivate block
nodes at the right point during migration. However, there are still
cases where it's necessary that the user can do this manually.
Images are only activated on the destination VM of a migration when the
VM is actually resumed. If the VM was paused, this doesn't happen
automatically. The user may want to perform some operation on a block
device (e.g. taking a snapshot or starting a block job) without also
resuming the VM yet. This is an example where a manual command is
necessary.
Another example is VM migration when the image files are opened by an
external qemu-storage-daemon instance on each side. In this case, the
process that needs to hand over the images isn't even part of the
migration and can't know when the migration completes. Management tools
need a way to explicitly inactivate images on the source and activate
them on the destination.
This adds a new blockdev-set-active QMP command that lets the user
change the status of individual nodes (this is necessary in
qemu-storage-daemon because it could be serving multiple VMs and only
one of them migrates at a time). For convenience, operating on all
devices (like QEMU does automatically during migration) is offered as an
option, too, and can be used in the context of single VM.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250204211407.381505-9-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In QEMU, nodes are automatically created inactive while expecting an
incoming migration (i.e. RUN_STATE_INMIGRATE). In qemu-storage-daemon,
the notion of runstates doesn't exist. It also wouldn't necessarily make
sense to introduce it because a single daemon can serve multiple VMs
that can be in different states.
Therefore, allow the user to explicitly open images as inactive with a
new option. The default is as before: Nodes are usually active, except
when created during RUN_STATE_INMIGRATE.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250204211407.381505-8-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This allows querying from QMP (and also HMP) whether an image is
currently active or inactive (in the sense of BDRV_O_INACTIVE).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250204211407.381505-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch implements a new chardev backend `hub` device, which
aggregates input from multiple backend devices and forwards it to a
single frontend device. Additionally, `hub` device takes the output
from the frontend device and sends it back to all the connected
backend devices. This allows for seamless interaction between
different backend devices and a single frontend interface.
The idea of the change is trivial: keep list of backend devices
(up to 4), init them on demand and forward data buffer back and
forth.
The following is QEMU command line example:
-chardev pty,path=/tmp/pty,id=pty0 \
-chardev vc,id=vc0 \
-chardev hub,id=hub0,chardevs.0=pty0,chardevs.1=vc0 \
-device virtconsole,chardev=hub0 \
-vnc 0.0.0.0:0
Which creates 2 backend devices: text virtual console (`vc0`) and a
pseudo TTY (`pty0`) connected to the single virtio hvc console with
the backend aggregator (`hub0`) help. `vc0` renders text to an image,
which can be shared over the VNC protocol. `pty0` is a pseudo TTY
backend which provides biderectional communication to the virtio hvc
console.
'chardevs.N' list syntax is used for the sake of compatibility with
the representation of JSON lists in 'key=val' pairs format of the
util/keyval.c, despite the fact that modern QAPI way of parsing,
namely qobject_input_visitor_new_str(), is not used. Choice of keeping
QAPI list syntax may help to smoothly switch to modern parsing in the
future.
Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Reviewed-by: "Marc-André Lureau" <marcandre.lureau@redhat.com>
Cc: qemu-devel@nongnu.org
Message-ID: <20250123085327.965501-3-r.peniaev@gmail.com>
DEVICE state was introduced back in 2017:
https://lore.kernel.org/qemu-devel/20171020090556.18631-1-dgilbert@redhat.com/
Quote from Dave's cover letter, when the pre-switchover phase was enabled,
the state transition looks like this:
The precopy flow is:
active->pre-switchover->device->completed
The postcopy flow is:
active->pre-switchover->postcopy-active->completed
To supplement above, when the cap is not enabled:
The precopy flow is:
active->completed
The postcopy flow is:
active->postcopy-active->completed
It works for us, though we have some code just to special case these state
transitions, so the DEVICE state currently is special only to precopy, and
only conditionally.
I had a quick discussion with Libvirt developers, it turns out that this
may not be necessary. IOW, it seems okay we can have DEVICE state to be
generic, so that we don't have over-complicated state machines. It not
only helps align all the migration state machine, help cleanup the code
path especially on pre-switchover handling (see the patch itself), another
side benefit is we can unconditionally have a specific state to mark the
switchover phase, which might be helpful for debugging too.
This patch makes the DEVICE state to be present always, marking that source
QEMU is switching over. Then the state machine will be always as simple
as:
active-> [pre-switchover->] -> device -> [postcopy-active->] -> complete
After the change, no matter whether pre-switchover or postcopy is enabled
or not, we always have DEVICE state showing the switchover phase. When
pre-switchover enabled, we'll have an extra stage before that. When
postcopy is enabled, we'll have an extra stage after that.
A few qtests need touch up in QEMU tree for this change:
- A few iotest outputs (194, 203, 234, 262, 280)
- Teach libqos's migrate() on "device" state
Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Dr. David Alan Gilbert <dave@treblig.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
Link: https://lore.kernel.org/r/20250114230746.3268797-15-peterx@redhat.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Add the cpr-transfer migration mode, which allows the user to transfer
a guest to a new QEMU instance on the same host with minimal guest pause
time, by preserving guest RAM in place, albeit with new virtual addresses
in new QEMU, and by preserving device file descriptors. Pages that were
locked in memory for DMA in old QEMU remain locked in new QEMU, because the
descriptor of the device that locked them remains open.
cpr-transfer preserves memory and devices descriptors by sending them to
new QEMU over a unix domain socket using SCM_RIGHTS. Such CPR state cannot
be sent over the normal migration channel, because devices and backends
are created prior to reading the channel, so this mode sends CPR state
over a second "cpr" migration channel. New QEMU reads the cpr channel
prior to creating devices or backends. The user specifies the cpr channel
in the channel arguments on the outgoing side, and in a second -incoming
command-line parameter on the incoming side.
The user must start old QEMU with the the '-machine aux-ram-share=on' option,
which allows anonymous memory to be transferred in place to the new process
by transferring a memory descriptor for each ram block. Memory-backend
objects must have the share=on attribute, but memory-backend-epc is not
supported.
The user starts new QEMU on the same host as old QEMU, with command-line
arguments to create the same machine, plus the -incoming option for the
main migration channel, like normal live migration. In addition, the user
adds a second -incoming option with channel type "cpr". This CPR channel
must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a
UNIX domain socket.
To initiate CPR, the user issues a migrate command to old QEMU, adding
a second migration channel of type "cpr" in the channels argument.
Old QEMU stops the VM, saves state to the migration channels, and enters
the postmigrate state. New QEMU mmap's memory descriptors, and execution
resumes.
The implementation splits qmp_migrate into start and finish functions.
Start sends CPR state to new QEMU, which responds by closing the CPR
channel. Old QEMU detects the HUP then calls finish, which connects the
main migration channel.
In summary, the usage is:
qemu-system-$arch -machine aux-ram-share=on ...
start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"
Issue commands to old QEMU:
migrate_set_parameter mode cpr-transfer
{"execute": "migrate", ...
{"channel-type": "main"...}, {"channel-type": "cpr"...} ... }
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-17-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
This feature was only applied during the 9.2 cycle, so reflect
that rather than 9.1.
Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Closes: https://lore.kernel.org/qemu-devel/ZyngEiwmYeZ-DvCy@redhat.com/
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20241107123446.902801-3-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>