motorhead/qemu

mirror of https://github.com/Motorhead1991/qemu.git synced 2025-08-02 07:13:54 -06:00

Author	SHA1	Message	Date
Stefan Hajnoczi	7389992c84	Memory pull request for 10.0 v2 changelog: - Fix Mac (and possibly some other) build issues for two patches - os: add an ability to lock memory on_fault - memory: pass MemTxAttrs to memory_access_is_direct() List of features: - William's fix on ram hole punching when with file offset - Daniil's patchset to introduce mem-lock=on-fault - William's hugetlb hwpoison fix for size report & remap - David's series to allow qemu debug writes to MMIOs -----BEGIN PGP SIGNATURE----- iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZ6zcQBIccGV0ZXJ4QHJl ZGhhdC5jb20ACgkQO1/MzfOr1wbL3wEAqx94NpB/tEEBj6WXE3uV9LqQ0GCTYmV+ MbM51Vep8ksA/35yFn3ltM2yoSnUf9WJW6LXEEKhQlwswI0vChQERgkE =++O1 -----END PGP SIGNATURE----- Merge tag 'mem-next-pull-request' of https://gitlab.com/peterx/qemu into staging Memory pull request for 10.0 v2 changelog: - Fix Mac (and possibly some other) build issues for two patches - os: add an ability to lock memory on_fault - memory: pass MemTxAttrs to memory_access_is_direct() List of features: - William's fix on ram hole punching when with file offset - Daniil's patchset to introduce mem-lock=on-fault - William's hugetlb hwpoison fix for size report & remap - David's series to allow qemu debug writes to MMIOs # -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZ6zcQBIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wbL3wEAqx94NpB/tEEBj6WXE3uV9LqQ0GCTYmV+ # MbM51Vep8ksA/35yFn3ltM2yoSnUf9WJW6LXEEKhQlwswI0vChQERgkE # =++O1 # -----END PGP SIGNATURE----- # gpg: Signature made Thu 13 Feb 2025 01:37:04 HKT # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "peterx@redhat.com" # gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [full] # gpg: aka "Peter Xu <peterx@redhat.com>" [full] # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706 * tag 'mem-next-pull-request' of https://gitlab.com/peterx/qemu: overcommit: introduce mem-lock=on-fault system: introduce a new MlockState enum system/vl: extract overcommit option parsing into a helper os: add an ability to lock memory on_fault system/physmem: poisoned memory discard on reboot system/physmem: handle hugetlb correctly in qemu_ram_remap() physmem: teach cpu_memory_rw_debug() to write to more memory regions hmp: use cpu_get_phys_page_debug() in hmp_gva2gpa() memory: pass MemTxAttrs to memory_access_is_direct() physmem: disallow direct access to RAM DEVICE in address_space_write_rom() physmem: factor out direct access check into memory_region_supports_direct_access() physmem: factor out RAM/ROMD check in memory_access_is_direct() physmem: factor out memory_region_is_ram_device() check in memory_access_is_direct() system/physmem: take into account fd_offset for file fallocate Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2025-02-19 08:36:26 +08:00
Steve Sistare	b451705e3b	migration: use parameters.mode in cpr_state_save qmp_migrate guarantees that cpr_channel is not null for MIG_MODE_CPR_TRANSFER when cpr_state_save is called: qmp_migrate() if (s->parameters.mode == MIG_MODE_CPR_TRANSFER && !cpr_channel) { return; } cpr_state_save(cpr_channel) but cpr_state_save checks for mode differently before using channel, and Coverity cannot infer that they are equivalent in outgoing QEMU, and warns that channel may be NULL: cpr_state_save(channel) MigMode mode = migrate_mode(); if (mode == MIG_MODE_CPR_TRANSFER) { f = cpr_transfer_output(channel, errp); To make Coverity happy, assert that channel != NULL in cpr_state_save. Resolves: Coverity CID 1590980 Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Message-ID: <1738788841-211843-1-git-send-email-steven.sistare@oracle.com> [assert instead of using parameters.mode in cpr_state_save] Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:06 -03:00
Fabiano Rosas	4a228bcc99	migration: Don't set FAILED state when cancelling The expected outcome from qmp_migrate_cancel() is that the source migration goes to the terminal state MIGRATION_STATUS_CANCELLED. Anything different from this is a bug when cancelling. Make sure there is never a state transition from an unspecified state into FAILED. Code that sets FAILED, should always either make sure that the old state is not CANCELLING or specify the old state. Note that the destination is allowed to go into FAILED, so there's no issue there. (I don't think this is relevant as a backport because cancelling does work, it just doesn't show the right state at the end) Fixes: `3dde8fdbad` ("migration: Merge precopy/postcopy on switchover start") Fixes: `d0edb8a173` ("migration: Create the postcopy preempt channel asynchronously") Fixes: `8518278a6a` ("migration: implementation of background snapshot thread") Fixes: `bf78a046b9` ("migration: refactor migrate_fd_connect failures") Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250213175927.19642-7-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:06 -03:00
Fabiano Rosas	646119088f	migration: Reject qmp_migrate_cancel after postcopy After postcopy has started, it's not possible to recover the source machine in case a migration error occurs because the destination has already been changing the state of the machine. For that same reason, it doesn't make sense to try to cancel the migration after postcopy has started. Reject the cancel command during postcopy. Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250213175927.19642-6-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:05 -03:00
Fabiano Rosas	2b667a8c0f	migration: Fix hang after error in destination setup phase If the destination side fails at migration_ioc_process_incoming() before starting the coroutine, it will report the error but QEMU will not exit. Set the migration state to FAILED and exit the process if exit-on-error allows. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2633 Reported-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250213175927.19642-5-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:05 -03:00
Fabiano Rosas	4bbadfc55e	migration: Change migrate_fd_ to migration_ Remove all instances of _fd_ from the migration generic code. These functions have grown over time and the _fd_ part is now just confusing. migration_fd_error() -> migration_error() makes it a little vague. Since it's only used for migration_connect() failures, change it to migration_connect_set_error(). Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250213175927.19642-4-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:05 -03:00
Fabiano Rosas	8444d09381	migration: Unify migration_cancel and migrate_fd_cancel There's no need for two separate functions and this _fd_ is a historic artifact that makes little sense nowadays. Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250213175927.19642-3-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:05 -03:00
Fabiano Rosas	a47f0cfba8	migration: Set migration error outside of migrate_cancel There's no point passing the error into migration cancel only for it to call migrate_set_error(). Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250213175927.19642-2-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:05 -03:00
Fabiano Rosas	e0ad300fe1	migration: Check migration error after loadvm We're currently only checking the QEMUFile error after qemu_loadvm_state(). This was causing a TLS termination error from multifd recv threads to be ignored. Start checking the migration error as well to avoid missing further errors. Regarding compatibility concerning the TLS termination error that was being ignored, for QEMUs <= 9.2 - if the old QEMU is being used as migration source - the recently added migration property multifd-tls-clean-termination needs to be set to OFF in the destination machine. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:04 -03:00
Fabiano Rosas	9b3b192f65	migration/multifd: Add a compat property for TLS termination We're currently changing the way the source multifd migration handles the shutdown of the multifd channels when TLS is in use to perform a clean termination by calling gnutls_bye(). Older src QEMUs will always close the channel without terminating the TLS session. New dst QEMUs treat an unclean termination as an error. Add multifd_clean_tls_termination (default true) that can be switched on the destination whenever a src QEMU <= 9.2 is in use. (Note that the compat property is only strictly necessary for src QEMUs older than 9.1. Due to synchronization coincidences, src QEMUs 9.1 and 9.2 can put the destination in a condition where it doesn't see the unclean termination. Still, make the property more inclusive to facilitate potential backports.) Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:04 -03:00
Fabiano Rosas	48796f6b44	migration/multifd: Terminate the TLS connection The multifd recv side has been getting a TLS error of GNUTLS_E_PREMATURE_TERMINATION at the end of migration when the send side closes the sockets without ending the TLS session. This has been masked by the code not checking the migration error after loadvm. Start ending the TLS session at multifd_send_shutdown() so the recv side always sees a clean termination (EOF) and we can start to differentiate that from an actual premature termination that might possibly happen in the middle of the migration. There's nothing to be done if a previous migration error has already broken the connection, so add a comment explaining it and ignore any errors coming from gnutls_bye(). This doesn't break compat with older recv-side QEMUs because EOF has always caused the recv thread to exit cleanly. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:04 -03:00
Daniil Tatianin	13057e064a	overcommit: introduce mem-lock=on-fault Locking the memory without MCL_ONFAULT instantly prefaults any mmaped anonymous memory with a write-fault, which introduces a lot of extra overhead in terms of memory usage when all you want to do is to prevent kcompactd from migrating and compacting QEMU pages. Add an option to only lock pages lazily as they're faulted by the process by using MCL_ONFAULT if asked. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Link: https://lore.kernel.org/r/20250212143920.1269754-5-d-tatianin@yandex-team.ru Signed-off-by: Peter Xu <peterx@redhat.com>	2025-02-12 11:36:13 -05:00
Daniil Tatianin	cd2e472e54	system: introduce a new MlockState enum Replace the boolean value enable_mlock with an enum and add a helper to decide whether we should be calling os_mlock. This is a stepping stone towards introducing a new mlock mode, which will be the third possible state of this enum. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Link: https://lore.kernel.org/r/20250212143920.1269754-4-d-tatianin@yandex-team.ru Signed-off-by: Peter Xu <peterx@redhat.com>	2025-02-12 11:36:13 -05:00
Daniil Tatianin	e76fadf93e	os: add an ability to lock memory on_fault This will be used in the following commits to make it possible to only lock memory on fault instead of right away. Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Link: https://lore.kernel.org/r/20250212143920.1269754-2-d-tatianin@yandex-team.ru [peterx: fail os_mlock(on_fault=1) when not supported] [peterx: use G_GNUC_UNUSED instead of "(void)on_fault", per Dan] Signed-off-by: Peter Xu <peterx@redhat.com>	2025-02-12 11:36:01 -05:00
Stefan Hajnoczi	f2ec48fefd	Block layer patches - Managing inactive nodes (enables QSD migration with shared storage) - Fix swapped values for BLOCK_IO_ERROR 'device' and 'qom-path' - vpc: Read images exported from Azure correctly - scripts/qemu-gdb: Support coroutine dumps in coredumps - Minor cleanups -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmek34IRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9bDpxAAnTvwmdazAXG0g9GzqvrEB/+6rStjAsqE 9MTWV4WxyN41d0RXxN8CYKb8CXSiTRyw6r3CSGNYEI2eShe9e934PriSkZm41HyX n9Yh5YxqGZqitzvPtx62Ii/1KG+PcjQbfHuK1p4+rlKa0yQ2eGlio1JIIrZrCkBZ ikZcQUrhIyD0XV8hTQ2+Ysa+ZN6itjnlTQIG3gS3m8f8WR7kyUXD8YFMQFJFyjVx NrAIpLnc/ln9+5PZR9tje8U7XEn2KCgI5pgGaQnrd0h0G1H4ig8ogzYYnKTLhjU/ AmQpS8np8Tyg6S1UZTiekEq0VuAhThEQc5b3sGbmHWH/R2ABMStyf18oCBAkPzZ7 s6h+3XzTKKY2Q5Q3ZG/ANkUJjTNBhdj1fcaARvbSWsqsuk5CWX/I3jzvgihFtCSs eGu+b/bLeW6P7hu4qPHBcgLHuB1Fc7Rd2t4BoIGM1wcO2CeC9DzUKOiIMZOEJIh0 GGqCkEWDHgckDTakD4/vSqm0UDKt6FSlQC9ga/ILBY3IB5HpHoArY58selymy28i X7MgAvbjdsmNuUuXDZZOiObcFt3j8jlmwPJpPyzXPQIiPX1RXeBPRhVAEeZCKn6Z tfHr72SJdMeVOGXVTvOrJ2iW+4g03rPdmkDFCUhpOwo62RODq7ahvCIXsNf3nEFR rSB3T1M/8EM= =iQLP -----END PGP SIGNATURE----- Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging Block layer patches - Managing inactive nodes (enables QSD migration with shared storage) - Fix swapped values for BLOCK_IO_ERROR 'device' and 'qom-path' - vpc: Read images exported from Azure correctly - scripts/qemu-gdb: Support coroutine dumps in coredumps - Minor cleanups # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmek34IRHGt3b2xmQHJl # ZGhhdC5jb20ACgkQfwmycsiPL9bDpxAAnTvwmdazAXG0g9GzqvrEB/+6rStjAsqE # 9MTWV4WxyN41d0RXxN8CYKb8CXSiTRyw6r3CSGNYEI2eShe9e934PriSkZm41HyX # n9Yh5YxqGZqitzvPtx62Ii/1KG+PcjQbfHuK1p4+rlKa0yQ2eGlio1JIIrZrCkBZ # ikZcQUrhIyD0XV8hTQ2+Ysa+ZN6itjnlTQIG3gS3m8f8WR7kyUXD8YFMQFJFyjVx # NrAIpLnc/ln9+5PZR9tje8U7XEn2KCgI5pgGaQnrd0h0G1H4ig8ogzYYnKTLhjU/ # AmQpS8np8Tyg6S1UZTiekEq0VuAhThEQc5b3sGbmHWH/R2ABMStyf18oCBAkPzZ7 # s6h+3XzTKKY2Q5Q3ZG/ANkUJjTNBhdj1fcaARvbSWsqsuk5CWX/I3jzvgihFtCSs # eGu+b/bLeW6P7hu4qPHBcgLHuB1Fc7Rd2t4BoIGM1wcO2CeC9DzUKOiIMZOEJIh0 # GGqCkEWDHgckDTakD4/vSqm0UDKt6FSlQC9ga/ILBY3IB5HpHoArY58selymy28i # X7MgAvbjdsmNuUuXDZZOiObcFt3j8jlmwPJpPyzXPQIiPX1RXeBPRhVAEeZCKn6Z # tfHr72SJdMeVOGXVTvOrJ2iW+4g03rPdmkDFCUhpOwo62RODq7ahvCIXsNf3nEFR # rSB3T1M/8EM= # =iQLP # -----END PGP SIGNATURE----- # gpg: Signature made Thu 06 Feb 2025 11:12:50 EST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (25 commits) block: remove unused BLOCK_OP_TYPE_DATAPLANE iotests: Add (NBD-based) tests for inactive nodes iotests: Add qsd-migrate case iotests: Add filter_qtest() nbd/server: Support inactive nodes block/export: Add option to allow export of inactive nodes block: Drain nodes before inactivating them block/export: Don't ignore image activation error in blk_exp_add() block: Support inactive nodes in blk_insert_bs() block: Add blockdev-set-active QMP command block: Add option to create inactive nodes block: Fix crash on block_resize on inactive node block: Don't attach inactive child to active node migration/block-active: Remove global active flag block: Inactivate external snapshot overlays when necessary block: Allow inactivating already inactive nodes block: Add 'active' field to BlockDeviceInfo block-backend: Fix argument order when calling 'qapi_event_send_block_io_error()' scripts/qemu-gdb: Support coroutine dumps in coredumps scripts/qemu-gdb: Simplify fs_base fetching for coroutines ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2025-02-10 13:25:36 -05:00
Daniel P. Berrangé	407bc4bf90	qapi: Move include/qapi/qmp/ to include/qobject/ The general expectation is that header files should follow the same file/path naming scheme as the corresponding source file. There are various historical exceptions to this practice in QEMU, with one of the most notable being the include/qapi/qmp/ directory. Most of the headers there correspond to source files in qobject/. This patch corrects most of that inconsistency by creating include/qobject/ and moving the headers for qobject/ there. This also fixes MAINTAINERS for include/qapi/qmp/dispatch.h: scripts/get_maintainer.pl now reports "QAPI" instead of "No maintainers found". Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> #s390x Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20241118151235.2665921-2-armbru@redhat.com> [Rebased]	2025-02-10 15:33:16 +01:00
Kevin Wolf	c2a189976e	migration/block-active: Remove global active flag Block devices have an individual active state, a single global flag can't cover this correctly. This becomes more important as we allow users to manually manage which nodes are active or inactive. Now that it's allowed to call bdrv_inactivate_all() even when some nodes are already inactive, we can remove the flag and just unconditionally call bdrv_inactivate_all() and, more importantly, bdrv_activate_all() before we make use of the nodes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250204211407.381505-5-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-02-06 14:26:51 +01:00
Prasad Pandit	bc38dc2f5f	migration: refactor ram_save_target_page functions Refactor ram_save_target_page legacy and multifd functions into one. Other than simplifying it, it frees 'migration_ops' object from usage, so it is expunged. Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-ID: <20250127120823.144949-3-ppandit@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:42 -03:00
Peter Xu	a10b37c553	migration: Trivial cleanup on JSON writer of vmstate_save() Two small cleanups in the same section of vmstate_save(): - Check vmdesc before the "mixed null/non-null data in array" logic, to be crystal clear that it's only about the JSON writer, not the vmstate on its own in the migration stream. - Since we have is_null variable now, use that to replace a check. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-17-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	3dde8fdbad	migration: Merge precopy/postcopy on switchover start Now after all the cleanups, finally we can merge the switchover startup phase into one single function for precopy/postcopy. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-16-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	4881411136	migration: Always set DEVICE state DEVICE state was introduced back in 2017: https://lore.kernel.org/qemu-devel/20171020090556.18631-1-dgilbert@redhat.com/ Quote from Dave's cover letter, when the pre-switchover phase was enabled, the state transition looks like this: The precopy flow is: active->pre-switchover->device->completed The postcopy flow is: active->pre-switchover->postcopy-active->completed To supplement above, when the cap is not enabled: The precopy flow is: active->completed The postcopy flow is: active->postcopy-active->completed It works for us, though we have some code just to special case these state transitions, so the DEVICE state currently is special only to precopy, and only conditionally. I had a quick discussion with Libvirt developers, it turns out that this may not be necessary. IOW, it seems okay we can have DEVICE state to be generic, so that we don't have over-complicated state machines. It not only helps align all the migration state machine, help cleanup the code path especially on pre-switchover handling (see the patch itself), another side benefit is we can unconditionally have a specific state to mark the switchover phase, which might be helpful for debugging too. This patch makes the DEVICE state to be present always, marking that source QEMU is switching over. Then the state machine will be always as simple as: active-> [pre-switchover->] -> device -> [postcopy-active->] -> complete After the change, no matter whether pre-switchover or postcopy is enabled or not, we always have DEVICE state showing the switchover phase. When pre-switchover enabled, we'll have an extra stage before that. When postcopy is enabled, we'll have an extra stage after that. A few qtests need touch up in QEMU tree for this change: - A few iotest outputs (194, 203, 234, 262, 280) - Teach libqos's migrate() on "device" state Cc: Jiri Denemark <jdenemar@redhat.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Cc: Dr. David Alan Gilbert <dave@treblig.org> Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-15-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	455c1963d3	migration: Cleanup qemu_savevm_state_complete_precopy() Now qemu_savevm_state_complete_precopy() is never used in postcopy, clean it up as in_postcopy==false now unconditionally. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-14-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	15c2ffa0b7	migration: Unwrap qemu_savevm_state_complete_precopy() in postcopy Postcopy invokes qemu_savevm_state_complete_precopy() twice for a long time, and that caused way too much confusions. Let's clean this up and make postcopy easier to read. It's actually fairly straightforward: postcopy starts with saving non-postcopiable iterables, then later it saves again with non-iterable only. Move these two calls out makes everything much easier to follow. Otherwise it's very unclear what qemu_savevm_state_complete_precopy() did in either of the calls. No functional change intended. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-13-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	46b0155ecf	migration: Notify COMPLETE once for postcopy Postcopy invokes qemu_savevm_state_complete_precopy() twice, that means it'll invoke COMPLETE notify twice.. also twice the tracepoints that marking precopy complete. Move that notification (along with the tracepoint) out to the caller, so that postcopy will only notify once right at the start of switchover phase from precopy. When at it, rename it to suite the file now it locates. For precopy, there should have no functional change except the tracepoint has a name change. For the other two users of qemu_savevm_state_complete_precopy(), namely: qemu_savevm_state() and qemu_savevm_live_state(): the notifier shouldn't matter because they're not precopy at all. Now in these two contexts (aka, "savevm", and "colo") sometimes the precopy notifiers will still be invoked, but that's outside the scope of this patch. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-12-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	a880ddd8ce	migration: Take BQL slightly longer in postcopy_start() This paves way for some follow up patch to modify migration states at the end of postcopy_start(), which should better be with the BQL so that there's no way of concurrent cancellation. So we'll do something slightly more with BQL but they're really trivial, hopefully nothing will really chance with this. A side benefit is we can drop another explicit lock() in failure path. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-11-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	ec611bd731	migration: Drop cached migration state in migration_maybe_pause() I can't see why we must cache the state now after we avoided possible CANCEL race: that's the only thing I can think of that can modify the migration state concurrently with the migration thread itself. Make all the state updates to happen always, then we don't need to cache the state anymore. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-10-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	1f9b657cae	migration: Adjust locking in migration_maybe_pause() In migration_maybe_pause() QEMU may yield BQL before waiting for a semaphore. However it yields the BQL too early, which logically gives it chance for the main thread to quickly take the BQL and modify the state to CANCELLING. To avoid such race condition from happening at all, always update the migration states within the BQL. It'll make sure no concurrent cancellation can ever happen. With that, IIUC there's chance we can remove the extra parameter in migration_maybe_pause() to update active state, but that'll be done separately later. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-9-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00
Peter Xu	40004007e6	migration: Adjust postcopy bandwidth during switchover Precopy uses unlimited bandwidth always during switchover, it makes sense because this is so critical and no one would like to throttle bandwidth during the VM blackout. OTOH, postcopy surprisingly didn't do that. There's one line that in the middle of the postcopy switchover it tries to switch to postcopy's specified max-postcopy-bandwidth, but even so it's somewhere in the middle which is strange. This patch brings the two modes to always use unlimited bandwidth for switchover, meanwhile only apply the postcopy max bandwidth after the switchover is completed. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-8-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00
Peter Xu	89011a702f	migration: Synchronize all CPU states only for non-iterable dump Do one shot cpu sync at qemu_savevm_state_complete_precopy_non_iterable(), instead of coding it separately in two places. Note that in the context of qemu_savevm_state_complete_precopy(), this patch is also an optimization for postcopy path, in that we can avoid sync cpu twice during switchover: before this patch, postcopy_start() invokes twice on qemu_savevm_state_complete_precopy(), each of them will try to sync CPU info. In reality, only one of them would be enough. For background snapshot, there's no intended functional change. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-7-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00
Peter Xu	4822128693	migration: Drop inactivate_disk param in qemu_savevm_state_complete* This parameter is only used by one caller, which is the genuine precopy complete path (migration_completion_precopy). The parameter was introduced in `a1fbe750fd` ("migration: Fix race of image locking between src and dst") to make sure the inactivate will happen before EOF to make sure dest will always be able to activate the disk properly. However there's no limitation on how early we inactivate the disk. For precopy completion path, we can always do that as long as VM is stopped. Move the disk inactivate there, then we can remove this inactivate_disk parameter in the whole call stack, because all the rest users pass in false always. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-6-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00
Peter Xu	812145fcf7	migration: Avoid two src-downtime-end tracepoints for postcopy Postcopy can trigger this tracepoint twice, while only the 1st one is valid. Avoid triggering the 2nd tracepoint just like what we do with recording the total downtime. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-5-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00
Peter Xu	9cde9b435a	migration: Optimize postcopy on downtime by avoiding JSON writer postcopy_start() is the entry function that postcopy is destined to start. It also means QEMU source will not dump VM description, aka, the JSON writer is garbage now. We can leave that to be cleaned up when migration completes, however when with the JSON writer object being present, vmstate_save() will still try to construct the JSON objects for the VM descriptions, even though it'll never be used later if it's postcopy. To save those cycles, release the JSON writer earlier for postcopy. Then vmstate_save() later will be smart enough to skip the JSON object constructions completely. It can logically reduce downtime because all such JSON constructions happen during postcopy blackout. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-4-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00
Peter Xu	a55090db2a	migration: Do not construct JSON description if suppressed QEMU machine has a property "suppress-vmdesc". When it is enabled, QEMU will stop attaching JSON VM description at the end of the precopy migration stream (postcopy is never affected because postcopy never attach that). However even if it's suppressed by the user, the source QEMU will still construct the JSON descriptions, which is a complete waste of CPU and memory resources. To avoid it, only create the JSON writer object if suppress-vmdesc is not specified. Luckily, vmstate_save() already supports vmdesc==NULL, so only a few spots that are left to be prepared that vmdesc can be NULL now. When at it, move the init / destroy of the JSON writer object to start / end of the migration - the JSON writer object is a sub-struct of migration state, and that looks like the only object that was dynamically allocated / destroyed within migration process. Make it the same as the rest objects that migration uses. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-3-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00
Peter Xu	013c6e1f42	migration: Remove postcopy implications in should_send_vmdesc() should_send_vmdesc() has a hack inside (which was not reflected in the function name) in that it tries to detect global postcopy state and that will affect the value to be returned. It's easier to keep the helper simple by only check the suppress-vmdesc property. Then: - On the sender side of its usage, there's already in_postcopy variable that we can use: postcopy doesn't send vmdesc at all, so directly skip everything for postcopy. - On the recv side, when reaching vmdesc processing it must be precopy code already, hence that hack check never used to work anyway. No functional change intended, except a trivial side effect that QEMU source will start to avoid running some JSON helper in postcopy path, but that would only reduce the postcopy blackout window a bit, rather than any other bad side effect. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-2-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00
Steve Sistare	624e6e654e	migration: cpr-transfer mode Add the cpr-transfer migration mode, which allows the user to transfer a guest to a new QEMU instance on the same host with minimal guest pause time, by preserving guest RAM in place, albeit with new virtual addresses in new QEMU, and by preserving device file descriptors. Pages that were locked in memory for DMA in old QEMU remain locked in new QEMU, because the descriptor of the device that locked them remains open. cpr-transfer preserves memory and devices descriptors by sending them to new QEMU over a unix domain socket using SCM_RIGHTS. Such CPR state cannot be sent over the normal migration channel, because devices and backends are created prior to reading the channel, so this mode sends CPR state over a second "cpr" migration channel. New QEMU reads the cpr channel prior to creating devices or backends. The user specifies the cpr channel in the channel arguments on the outgoing side, and in a second -incoming command-line parameter on the incoming side. The user must start old QEMU with the the '-machine aux-ram-share=on' option, which allows anonymous memory to be transferred in place to the new process by transferring a memory descriptor for each ram block. Memory-backend objects must have the share=on attribute, but memory-backend-epc is not supported. The user starts new QEMU on the same host as old QEMU, with command-line arguments to create the same machine, plus the -incoming option for the main migration channel, like normal live migration. In addition, the user adds a second -incoming option with channel type "cpr". This CPR channel must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a UNIX domain socket. To initiate CPR, the user issues a migrate command to old QEMU, adding a second migration channel of type "cpr" in the channels argument. Old QEMU stops the VM, saves state to the migration channels, and enters the postmigrate state. New QEMU mmap's memory descriptors, and execution resumes. The implementation splits qmp_migrate into start and finish functions. Start sends CPR state to new QEMU, which responds by closing the CPR channel. Old QEMU detects the HUP then calls finish, which connects the main migration channel. In summary, the usage is: qemu-system-$arch -machine aux-ram-share=on ... start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>" Issue commands to old QEMU: migrate_set_parameter mode cpr-transfer {"execute": "migrate", ... {"channel-type": "main"...}, {"channel-type": "cpr"...} ... } Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/1736967650-129648-17-git-send-email-steven.sistare@oracle.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:24 -03:00
Steve Sistare	b3698869f4	migration: cpr-transfer save and load Add functions to create a QEMUFile based on a unix URI, for saving or loading, for use by cpr-transfer mode to preserve CPR state. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1736967650-129648-16-git-send-email-steven.sistare@oracle.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:43:05 -03:00
Steve Sistare	e3965dc352	migration: VMSTATE_FD Define VMSTATE_FD for declaring a file descriptor field in a VMStateDescription. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1736967650-129648-15-git-send-email-steven.sistare@oracle.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:43:05 -03:00
Steve Sistare	b5779dc7cf	migration: SCM_RIGHTS for QEMUFile Define functions to put/get file descriptors to/from a QEMUFile, for qio channels that support SCM_RIGHTS. Maintain ordering such that put(A), put(fd), put(B) followed by get(A), get(fd), get(B) always succeeds. Other get orderings may succeed but are not guaranteed. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1736967650-129648-14-git-send-email-steven.sistare@oracle.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:43:05 -03:00
Steve Sistare	2862b6b924	migration: incoming channel Extend the -incoming option to allow an @MigrationChannel to be specified. This allows channels other than 'main' to be described on the command line, which will be needed for CPR. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1736967650-129648-13-git-send-email-steven.sistare@oracle.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:43:04 -03:00
Steve Sistare	f2374f0fc3	migration: enhance migrate_uri_parse Export migrate_uri_parse for use outside migration internals, and define a method migrate_is_uri that indicates when migrate_uri_parse should be used. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1736967650-129648-12-git-send-email-steven.sistare@oracle.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:43:04 -03:00
Steve Sistare	e7d79011a4	migration: cpr-state CPR must save state that is needed after QEMU is restarted, when devices are realized. Thus the extra state cannot be saved in the migration channel, as objects must already exist before that channel can be loaded. Instead, define auxilliary state structures and vmstate descriptions, not associated with any registered object, and serialize the aux state to a cpr-specific channel in cpr_state_save. Deserialize in cpr_state_load after QEMU restarts, before devices are realized. Provide accessors for clients to register file descriptors for saving. The mechanism for passing the fd's to the new process will be specific to each migration mode, and added in subsequent patches. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1736967650-129648-8-git-send-email-steven.sistare@oracle.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:43:04 -03:00
Marc-André Lureau	ed19620846	migration: fix -Werror=maybe-uninitialized ../migration/savevm.c: In function ‘qemu_savevm_state_complete_precopy_non_iterable’: ../migration/savevm.c:1560:20: error: ‘ret’ may be used uninitialized [-Werror=maybe-uninitialized] 1560 \| return ret; \| ^~~ Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20250114104811.2612846-1-marcandre.lureau@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:43:03 -03:00
Yuan Liu	a523bc5216	multifd: bugfix for incorrect migration data with qatzip compression When QPL compression is enabled on the migration channel and the same dirty page changes from a normal page to a zero page in the iterative memory copy, the dirty page will not be updated to a zero page again on the target side, resulting in incorrect memory data on the source and target sides. The root cause is that the target side does not record the normal pages to the receivedmap. The solution is to add ramblock_recv_bitmap_set_offset in target side to record the normal pages. Signed-off-by: Yuan Liu <yuan1.liu@intel.com> Reviewed-by: Jason Zeng <jason.zeng@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20241218091413.140396-4-yuan1.liu@intel.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-09 17:40:27 -03:00
Yuan Liu	2588a5f99b	multifd: bugfix for incorrect migration data with QPL compression When QPL compression is enabled on the migration channel and the same dirty page changes from a normal page to a zero page in the iterative memory copy, the dirty page will not be updated to a zero page again on the target side, resulting in incorrect memory data on the source and target sides. The root cause is that the target side does not record the normal pages to the receivedmap. The solution is to add ramblock_recv_bitmap_set_offset in target side to record the normal pages. Signed-off-by: Yuan Liu <yuan1.liu@intel.com> Reviewed-by: Jason Zeng <jason.zeng@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20241218091413.140396-3-yuan1.liu@intel.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-09 17:40:21 -03:00
Yuan Liu	cdc3970f85	multifd: bugfix for migration using compression methods When compression is enabled on the migration channel and the pages processed are all zero pages, these pages will not be sent and updated on the target side, resulting in incorrect memory data on the source and target sides. The root cause is that all compression methods call multifd_send_prepare_common to determine whether to compress dirty pages, but multifd_send_prepare_common does not update the IOV of MultiFDPacket_t when all dirty pages are zero pages. The solution is to always update the IOV of MultiFDPacket_t regardless of whether the dirty pages are all zero pages. Fixes: `303e6f54f9` ("migration/multifd: Implement zero page transmission on the multifd thread.") Cc: qemu-stable@nongnu.org #9.0+ Signed-off-by: Yuan Liu <yuan1.liu@intel.com> Reviewed-by: Jason Zeng <jason.zeng@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20241218091413.140396-2-yuan1.liu@intel.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-09 17:40:15 -03:00
Fabiano Rosas	35049eb0d2	migration: Fix arrays of pointers in JSON writer Currently, if an array of pointers contains a NULL pointer, that pointer will be encoded as '0' in the stream. Since the JSON writer doesn't define a "pointer" type, that '0' will now be an uint8, which is different from the original type being pointed to, e.g. struct. (we're further calling uint8 "nullptr", but that's irrelevant to the issue) That mixed-type array shouldn't be compressed, otherwise data is lost as the code currently makes the whole array have the type of the first element: css = {NULL, NULL, ..., 0x5555568a7940, NULL}; {"name": "s390_css", "instance_id": 0, "vmsd_name": "s390_css", "version": 1, "fields": [ ..., {"name": "css", "array_len": 256, "type": "nullptr", "size": 1}, ..., ]} In the above, the valid pointer at position 254 got lost among the compressed array of nullptr. While we could disable the array compression when a NULL pointer is found, the JSON part of the stream still makes part of downtime, so we should avoid writing unecessary bytes to it. Keep the array compression in place, but if NULL and non-NULL pointers are mixed break the array into several type-contiguous pieces : css = {NULL, NULL, ..., 0x5555568a7940, NULL}; {"name": "s390_css", "instance_id": 0, "vmsd_name": "s390_css", "version": 1, "fields": [ ..., {"name": "css", "array_len": 254, "type": "nullptr", "size": 1}, {"name": "css", "type": "struct", "struct": {"vmsd_name": "s390_css_img", ... }, "size": 768}, {"name": "css", "type": "nullptr", "size": 1}, ..., ]} Now each type-discontiguous region will become a new JSON entry. The reader should interpret this as a concatenation of values, all part of the same field. Parsing the JSON with analyze-script.py now shows the proper data being pointed to at the places where the pointer is valid and "nullptr" where there's NULL: "s390_css (14)": { ... "css": [ "nullptr", "nullptr", ... "nullptr", { "chpids": [ { "in_use": "0x00", "type": "0x00", "is_virtual": "0x00" }, ... ] }, "nullptr", } Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20250109185249.23952-7-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-09 17:39:54 -03:00
Peter Xu	9867c3a7ce	migration: Dump correct JSON format for nullptr replacement QEMU plays a trick with null pointers inside an array of pointers in a VMSD field. See `07d4e69147` ("migration/vmstate: fix array of ptr with nullptrs") for more details on why. The idea makes sense in general, but it may overlooked the JSON writer where it could write nothing in a "struct" in the JSON hints section. We hit some analyze-migration.py issues on s390 recently, showing that some of the struct field contains nothing, like: {"name": "css", "array_len": 256, "type": "struct", "struct": {}, "size": 1} As described in details by Fabiano: https://lore.kernel.org/r/87pll37cin.fsf@suse.de It could be that we hit some null pointers there, and JSON was gone when they're null pointers. To fix it, instead of hacking around only at VMStateInfo level, do that from VMStateField level, so that JSON writer can also be involved. In this case, JSON writer will replace the pointer array (which used to be a "struct") to be the real representation of the nullptr field. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20250109185249.23952-6-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-09 17:39:46 -03:00
Fabiano Rosas	f52965bf0e	migration: Rename vmstate_info_nullptr Rename vmstate_info_nullptr from "uint64_t" to "nullptr". This vmstate actually reads and writes just a byte, so the proper name would be uint8. However, since this is a marker for a NULL pointer, it's convenient to have a more explicit name that can be identified by the consumers of the JSON part of the stream. Change the name to "nullptr" and add support for it in the analyze-migration.py script. Arbitrarily use the name of the type as the value of the field to avoid the script showing 0x30 or '0', which could be confusing for readers. Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20250109185249.23952-5-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-09 17:39:38 -03:00
Fabiano Rosas	2aead53d39	migration: Remove unused argument in vmsd_desc_field_end Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20250109185249.23952-3-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-09 17:39:28 -03:00
Peter Xu	8597af7615	migration/block: Rewrite disk activation This patch proposes a flag to maintain disk activation status globally. It mostly rewrites disk activation mgmt for QEMU, including COLO and QMP command xen_save_devices_state. Backgrounds =========== We have two problems on disk activations, one resolved, one not. Problem 1: disk activation recover (for switchover interruptions) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When migration is either cancelled or failed during switchover, especially when after the disks are inactivated, QEMU needs to remember re-activate the disks again before vm starts. It used to be done separately in two paths: one in qmp_migrate_cancel(), the other one in the failure path of migration_completion(). It used to be fixed in different commits, all over the places in QEMU. So these are the relevant changes I saw, I'm not sure if it's complete list: - In 2016, commit `fe904ea824` ("migration: regain control of images when migration fails to complete") - In 2017, commit `1d2acc3162` ("migration: re-active images while migration been canceled after inactive them") - In 2023, commit `6dab4c93ec` ("migration: Attempt disk reactivation in more failure scenarios") Now since we have a slightly better picture maybe we can unify the reactivation in a single path. One side benefit of doing so is, we can move the disk operation outside QMP command "migrate_cancel". It's possible that in the future we may want to make "migrate_cancel" be OOB-compatible, while that requires the command doesn't need BQL in the first place. This will already do that and make migrate_cancel command lightweight. Problem 2: disk invalidation on top of invalidated disks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is an unresolved bug for current QEMU. Link in "Resolves:" at the end. It turns out besides the src switchover phase (problem 1 above), QEMU also needs to remember block activation on destination. Consider two continuous migration in a row, where the VM was always paused. In that scenario, the disks are not activated even until migration completed in the 1st round. When the 2nd round starts, if QEMU doesn't know the status of the disks, it needs to try inactivate the disk again. Here the issue is the block layer API bdrv_inactivate_all() will crash a QEMU if invoked on already inactive disks for the 2nd migration. For detail, see the bug link at the end. Implementation ============== This patch proposes to maintain disk activation with a global flag, so we know: - If we used to inactivate disks for migration, but migration got cancelled, or failed, QEMU will know it should reactivate the disks. - On incoming side, if the disks are never activated but then another migration is triggered, QEMU should be able to tell that inactivate is not needed for the 2nd migration. We used to have disk_inactive, but it only solves the 1st issue, not the 2nd. Also, it's done in completely separate paths so it's extremely hard to follow either how the flag changes, or the duration that the flag is valid, and when we will reactivate the disks. Convert the existing disk_inactive flag into that global flag (also invert its naming), and maintain the disk activation status for the whole lifecycle of qemu. That includes the incoming QEMU. Put both of the error cases of source migration (failure, cancelled) together into migration_iteration_finish(), which will be invoked for either of the scenario. So from that part QEMU should behave the same as before. However with such global maintenance on disk activation status, we not only cleanup quite a few temporary paths that we try to maintain the disk activation status (e.g. in postcopy code), meanwhile it fixes the crash for problem 2 in one shot. For freshly started QEMU, the flag is initialized to TRUE showing that the QEMU owns the disks by default. For incoming migrated QEMU, the flag will be initialized to FALSE once and for all showing that the dest QEMU doesn't own the disks until switchover. That is guaranteed by the "once" variable. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2395 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206230838.1111496-7-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-09 17:38:57 -03:00

1 2 3 4 5 ...

2446 commits