It is relatively common in the low-level qemu_api code to assert that
a field of a struct has a specific type; for example, it can be used
to ensure that the fields match what the qemu_api and C code expects
for safety.
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a type that, together with the C function object_deinit, ensures the
correct drop order for QOM objects relative to their superclasses.
Right now it is not possible to implement the Drop trait for QOM classes
that are defined in Rust, as the drop() function would not be called when
the object goes away; instead what is called is ObjectImpl::INSTANCE_FINALIZE.
It would be nice for INSTANCE_FINALIZE to just drop the object, but this has
a problem: suppose you have
pub struct MySuperclass {
parent: DeviceState,
field: Box<MyData>,
...
}
impl Drop for MySuperclass {
...
}
pub struct MySubclass {
parent: MySuperclass,
...
}
and an instance_finalize implementation that is like
unsafe extern "C" fn drop_object<T: ObjectImpl>(obj: *mut Object) {
unsafe { std::ptr::drop_in_place(obj.cast::<T>()) }
}
When instance_finalize is called for MySubclass, it will walk the struct's
list of fields and call the drop method for MySuperclass. Then, object_deinit
recurses to the superclass and calls the same drop method again. This
will cause double-freeing of the Box<Data>.
What's happening here is that QOM wants to control the drop order of
MySuperclass and MySubclass's fields. To do so, the parent field must
be marked ManuallyDrop<>, which is quite ugly. Instead, add a wrapper
type ParentField<> that is specific to QOM. This hides the implementation
detail of *what* is special about the ParentField, and will also be easy
to check in the #[derive(Object)] macro.
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
rustc will check that every reachable #[cfg] matches a list of
the expected config names and values. Recent versions of rustc are
also complaining about #[cfg(test)], even if it is basically a standard
part of the language. So, always allow it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- compression:
Shameer's fix for CONFIG_UADK build
Yuan Liu fixes for zero-page, QPL, qatzip
- multifd sync cleanups, prereq. for VFIO and postcopy work
- fixes for 9.2 regressions:
multifd with pre-9.0 -> post-9.1 migrations (#2720)
s390x migration (#2704)
- fix for assertions during paused migrations; rework of
late-block-activate logic (#2395, #686)
- fixes for compressed arrays creation and parsing, mostly affecting
s390x
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmeBDgkQHGZhcm9zYXNA
c3VzZS5kZQAKCRDHmNx0G+wxnSlUEACl31wY+77JxWnBva/eDDwnJ9HiCrqsoqaZ
YIJJXNlk4lYJWNdZRt6p27exzWrQwm+kWKPECeCakgCMlfhnKCvejGq7iV/fJY4o
D8hjE3t1htQ8mfblY1+bqzg3Rml59KwXxiqAwvlljbNWdkXruv026dq9vgJMzFhi
ia043fOO1tYULIoawgmwmLEHnztht0v+ZTZ1v5KQbrH655tpxls/8kHc6v5PXEpA
3PSmCrCQh1dPtkYRjuJ9yHyfU+/T8tYwIjrU6VR1wQW7MBNkjtqNudaqAFiuyuqn
P8gh4rAQrMhA9y+aq6xSoJP8XGkuOHxLQtlNutlmtbcQyZ7JqgLmK9ZLdoPf21sK
//erV63NoyaciYB9Nk3NXflwroc6zyvo8A584kGNPwBznZOJLESP4SPvVm/nlE29
vbyq8AWHRjFiqqf6P0ttQLAFkusZJzM1Y9UakF51hyVBX70yfqLG20XXZtIq/aZA
GbBB2Fo0MIlbmWaur3vLsSzn7B8d++Gl9TTGcK/eIXJ1ANCuCxGv9fbXJQlP5F4I
3OAoSmAVJ2eqw4v0+2WMiEa8yUA5drNnDSI3VRkG+0K9jRfHKXki466/QQdGrNw7
8GuuzLBNai3gEKbavDU0Be73r982KjXeYXj7RuAkQfm0d4H7tiwtg91Cd1dPKfzh
mhpmOFJDCg==
=joNM
-----END PGP SIGNATURE-----
Merge tag 'migration-20250110-pull-request' of https://gitlab.com/farosas/qemu into staging
Migration pull request
- compression:
Shameer's fix for CONFIG_UADK build
Yuan Liu fixes for zero-page, QPL, qatzip
- multifd sync cleanups, prereq. for VFIO and postcopy work
- fixes for 9.2 regressions:
multifd with pre-9.0 -> post-9.1 migrations (#2720)
s390x migration (#2704)
- fix for assertions during paused migrations; rework of
late-block-activate logic (#2395, #686)
- fixes for compressed arrays creation and parsing, mostly affecting
s390x
# -----BEGIN PGP SIGNATURE-----
#
# iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmeBDgkQHGZhcm9zYXNA
# c3VzZS5kZQAKCRDHmNx0G+wxnSlUEACl31wY+77JxWnBva/eDDwnJ9HiCrqsoqaZ
# YIJJXNlk4lYJWNdZRt6p27exzWrQwm+kWKPECeCakgCMlfhnKCvejGq7iV/fJY4o
# D8hjE3t1htQ8mfblY1+bqzg3Rml59KwXxiqAwvlljbNWdkXruv026dq9vgJMzFhi
# ia043fOO1tYULIoawgmwmLEHnztht0v+ZTZ1v5KQbrH655tpxls/8kHc6v5PXEpA
# 3PSmCrCQh1dPtkYRjuJ9yHyfU+/T8tYwIjrU6VR1wQW7MBNkjtqNudaqAFiuyuqn
# P8gh4rAQrMhA9y+aq6xSoJP8XGkuOHxLQtlNutlmtbcQyZ7JqgLmK9ZLdoPf21sK
# //erV63NoyaciYB9Nk3NXflwroc6zyvo8A584kGNPwBznZOJLESP4SPvVm/nlE29
# vbyq8AWHRjFiqqf6P0ttQLAFkusZJzM1Y9UakF51hyVBX70yfqLG20XXZtIq/aZA
# GbBB2Fo0MIlbmWaur3vLsSzn7B8d++Gl9TTGcK/eIXJ1ANCuCxGv9fbXJQlP5F4I
# 3OAoSmAVJ2eqw4v0+2WMiEa8yUA5drNnDSI3VRkG+0K9jRfHKXki466/QQdGrNw7
# 8GuuzLBNai3gEKbavDU0Be73r982KjXeYXj7RuAkQfm0d4H7tiwtg91Cd1dPKfzh
# mhpmOFJDCg==
# =joNM
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 10 Jan 2025 07:09:45 EST
# gpg: using RSA key AA1B48B0A22326A5A4C364CFC798DC741BEC319D
# gpg: issuer "farosas@suse.de"
# gpg: Good signature from "Fabiano Rosas <farosas@suse.de>" [unknown]
# gpg: aka "Fabiano Almeida Rosas <fabiano.rosas@suse.com>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: AA1B 48B0 A223 26A5 A4C3 64CF C798 DC74 1BEC 319D
* tag 'migration-20250110-pull-request' of https://gitlab.com/farosas/qemu: (25 commits)
multifd: bugfix for incorrect migration data with qatzip compression
multifd: bugfix for incorrect migration data with QPL compression
multifd: bugfix for migration using compression methods
s390x: Fix CSS migration
migration: Fix arrays of pointers in JSON writer
migration: Dump correct JSON format for nullptr replacement
migration: Rename vmstate_info_nullptr
migration: Fix parsing of s390 stream
migration: Remove unused argument in vmsd_desc_field_end
migration: Add more error handling to analyze-migration.py
migration/block: Rewrite disk activation
migration/block: Fix possible race with block_inactive
migration/block: Apply late-block-active behavior to postcopy
migration/block: Make late-block-active the default
qmp/cont: Only activate disks if migration completed
migration: Add helper to get target runstate
migration/multifd: Fix compat with QEMU < 9.0
migration/multifd: Document the reason to sync for save_setup()
migration/multifd: Cleanup src flushes on condition check
migration/multifd: Remove sync processing on postcopy
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The set_machine and require_accelerator steps can bail early so move
those to the front of the test functions. While we are at it also
clean up some long lines when adding the vm arguments.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-23-alex.bennee@linaro.org>
The Alpine test boots from the CDROM so we don't --append a command
line. Drop the unused code.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-22-alex.bennee@linaro.org>
Use the utility functions to reduce code duplication.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-21-alex.bennee@linaro.org>
Rather than using the python library (which has a different API
anyway) lets just call the binary. zstdtools is already in out
qemu.yml so all test containers should have it around. Tests should
still use @skipIfMissingCommands('zstd') to gracefully handle when
only minimal dependencies have been installed.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-20-alex.bennee@linaro.org>
We have proper detection of prompts now so we don't need to guess with
sleep() sprinkled through the test. The extra step of calling halt is
just to flush the final bits of the log (although the last line is
still missed).
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-19-alex.bennee@linaro.org>
Time will not advance if the system is paused or there are no timer
events set for the future. In absence of pending timer events
advancing time would make no difference the system state. Attempting
to do so would be a bug and the test or device under test would need
fixing.
Tighten up the result reporting to `FAIL` if time was not advanced.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2687
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-18-alex.bennee@linaro.org>
In the qtest environment time will not step forward if the system is
paused (timers disabled) or we have no timer events to fire. As a
result VirtIO events are responded to directly and we don't need to
step time forward.
We still do timeout processing to handle the fact the target QEMU may
not be ready to respond right away. This will usually be due to a slow
CI system or if QEMU is running under something like rr.
Future qtest patches will assert that time actually changes when a
step is requested.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-17-alex.bennee@linaro.org>
This boot an OP-TEE environment, and launch a nested guest VM inside it
using the Realms feature. We do it for virt and sbsa-ref platforms.
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20241220165212.3653495-1-pierrick.bouvier@linaro.org>
[AJB: tweak ordering of setup, strip changelog from commit]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20250108121054.1126164-16-alex.bennee@linaro.org>
Now there are new up to date images available we should update to them.
Cc: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-15-alex.bennee@linaro.org>
Now there are new up to date images available we should update to them.
Cc: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-14-alex.bennee@linaro.org>
Now there are new up to date images available we should update to them.
Cc: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-13-alex.bennee@linaro.org>
Now there are new up to date images available we should update to them.
Note we re-use the riscv32 kernel and rootfs for test_riscv64_rv32.
Cc: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-12-alex.bennee@linaro.org>
Now there are new up to date images available we should update to them.
Cc: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-11-alex.bennee@linaro.org>
Now there are new up to date images available we should update to them.
Cc: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-10-alex.bennee@linaro.org>
Now there are new up to date images available we should update to them.
Cc: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-9-alex.bennee@linaro.org>
Now there are new up to date images available we should update to them.
Cc: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-8-alex.bennee@linaro.org>
Now there are new up to date images available we should update to them.
Cc: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-7-alex.bennee@linaro.org>
Now there are new up to date images available we should update to them.
Cc: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-6-alex.bennee@linaro.org>
Now there are new up to date images available we should update to them.
Cc: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-5-alex.bennee@linaro.org>
We didn't have this before and as it exercises the m68k virt platform
it seems worth adding. We don't wait for the shutdown because QEMU
will auto-exit on the shutdown.
Cc: Laurent Vivier <laurent@vivier.eu>
Cc: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-4-alex.bennee@linaro.org>
Now there are new up to date images available we should update to them.
Cc: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-3-alex.bennee@linaro.org>
Now there are new up to date images available we should update to them.
Cc: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250108121054.1126164-2-alex.bennee@linaro.org>
When QPL compression is enabled on the migration channel and the same
dirty page changes from a normal page to a zero page in the iterative
memory copy, the dirty page will not be updated to a zero page again
on the target side, resulting in incorrect memory data on the source
and target sides.
The root cause is that the target side does not record the normal pages
to the receivedmap.
The solution is to add ramblock_recv_bitmap_set_offset in target side
to record the normal pages.
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Jason Zeng <jason.zeng@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20241218091413.140396-4-yuan1.liu@intel.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
When QPL compression is enabled on the migration channel and the same
dirty page changes from a normal page to a zero page in the iterative
memory copy, the dirty page will not be updated to a zero page again
on the target side, resulting in incorrect memory data on the source
and target sides.
The root cause is that the target side does not record the normal pages
to the receivedmap.
The solution is to add ramblock_recv_bitmap_set_offset in target side
to record the normal pages.
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Jason Zeng <jason.zeng@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20241218091413.140396-3-yuan1.liu@intel.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
When compression is enabled on the migration channel and
the pages processed are all zero pages, these pages will
not be sent and updated on the target side, resulting in
incorrect memory data on the source and target sides.
The root cause is that all compression methods call
multifd_send_prepare_common to determine whether to compress
dirty pages, but multifd_send_prepare_common does not update
the IOV of MultiFDPacket_t when all dirty pages are zero pages.
The solution is to always update the IOV of MultiFDPacket_t
regardless of whether the dirty pages are all zero pages.
Fixes: 303e6f54f9 ("migration/multifd: Implement zero page transmission on the multifd thread.")
Cc: qemu-stable@nongnu.org #9.0+
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Jason Zeng <jason.zeng@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20241218091413.140396-2-yuan1.liu@intel.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Currently, if an array of pointers contains a NULL pointer, that
pointer will be encoded as '0' in the stream. Since the JSON writer
doesn't define a "pointer" type, that '0' will now be an uint8, which
is different from the original type being pointed to, e.g. struct.
(we're further calling uint8 "nullptr", but that's irrelevant to the
issue)
That mixed-type array shouldn't be compressed, otherwise data is lost
as the code currently makes the whole array have the type of the first
element:
css = {NULL, NULL, ..., 0x5555568a7940, NULL};
{"name": "s390_css", "instance_id": 0, "vmsd_name": "s390_css",
"version": 1, "fields": [
...,
{"name": "css", "array_len": 256, "type": "nullptr", "size": 1},
...,
]}
In the above, the valid pointer at position 254 got lost among the
compressed array of nullptr.
While we could disable the array compression when a NULL pointer is
found, the JSON part of the stream still makes part of downtime, so we
should avoid writing unecessary bytes to it.
Keep the array compression in place, but if NULL and non-NULL pointers
are mixed break the array into several type-contiguous pieces :
css = {NULL, NULL, ..., 0x5555568a7940, NULL};
{"name": "s390_css", "instance_id": 0, "vmsd_name": "s390_css",
"version": 1, "fields": [
...,
{"name": "css", "array_len": 254, "type": "nullptr", "size": 1},
{"name": "css", "type": "struct", "struct": {"vmsd_name": "s390_css_img", ... }, "size": 768},
{"name": "css", "type": "nullptr", "size": 1},
...,
]}
Now each type-discontiguous region will become a new JSON entry. The
reader should interpret this as a concatenation of values, all part of
the same field.
Parsing the JSON with analyze-script.py now shows the proper data
being pointed to at the places where the pointer is valid and
"nullptr" where there's NULL:
"s390_css (14)": {
...
"css": [
"nullptr",
"nullptr",
...
"nullptr",
{
"chpids": [
{
"in_use": "0x00",
"type": "0x00",
"is_virtual": "0x00"
},
...
]
},
"nullptr",
}
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-7-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
QEMU plays a trick with null pointers inside an array of pointers in a VMSD
field. See 07d4e69147 ("migration/vmstate: fix array of ptr with
nullptrs") for more details on why. The idea makes sense in general, but
it may overlooked the JSON writer where it could write nothing in a
"struct" in the JSON hints section.
We hit some analyze-migration.py issues on s390 recently, showing that some
of the struct field contains nothing, like:
{"name": "css", "array_len": 256, "type": "struct", "struct": {}, "size": 1}
As described in details by Fabiano:
https://lore.kernel.org/r/87pll37cin.fsf@suse.de
It could be that we hit some null pointers there, and JSON was gone when
they're null pointers.
To fix it, instead of hacking around only at VMStateInfo level, do that
from VMStateField level, so that JSON writer can also be involved. In this
case, JSON writer will replace the pointer array (which used to be a
"struct") to be the real representation of the nullptr field.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-6-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Rename vmstate_info_nullptr from "uint64_t" to "nullptr". This vmstate
actually reads and writes just a byte, so the proper name would be
uint8. However, since this is a marker for a NULL pointer, it's
convenient to have a more explicit name that can be identified by the
consumers of the JSON part of the stream.
Change the name to "nullptr" and add support for it in the
analyze-migration.py script. Arbitrarily use the name of the type as
the value of the field to avoid the script showing 0x30 or '0', which
could be confusing for readers.
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-5-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The parsing for the S390StorageAttributes section is currently leaving
an unconsumed token that is later interpreted by the generic code as
QEMU_VM_EOF, cutting the parsing short.
The migration will issue a STATTR_FLAG_DONE between iterations, which
the script consumes correctly, but there's a final STATTR_FLAG_EOS at
.save_complete that the script is ignoring. Since the EOS flag is a
u64 0x1ULL and the stream is big endian, on little endian hosts a byte
read from it will be 0x0, the same as QEMU_VM_EOF.
Fixes: 81c2c9dd5d ("tests/qtest/migration-test: Fix analyze-migration.py for s390x")
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-4-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The analyze-migration script was seen failing in s390x in misterious
ways. It seems we're reaching the VMSDFieldStruct constructor without
any fields, which would indicate an empty .subsection entry, a
VMSTATE_STRUCT with no fields or a vmsd with no fields. We don't have
any of those, at least not without the unmigratable flag set, so this
should never happen.
Add some debug statements so that we can see what's going on the next
time the issue happens.
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-2-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
This patch proposes a flag to maintain disk activation status globally. It
mostly rewrites disk activation mgmt for QEMU, including COLO and QMP
command xen_save_devices_state.
Backgrounds
===========
We have two problems on disk activations, one resolved, one not.
Problem 1: disk activation recover (for switchover interruptions)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When migration is either cancelled or failed during switchover, especially
when after the disks are inactivated, QEMU needs to remember re-activate
the disks again before vm starts.
It used to be done separately in two paths: one in qmp_migrate_cancel(),
the other one in the failure path of migration_completion().
It used to be fixed in different commits, all over the places in QEMU. So
these are the relevant changes I saw, I'm not sure if it's complete list:
- In 2016, commit fe904ea824 ("migration: regain control of images when
migration fails to complete")
- In 2017, commit 1d2acc3162 ("migration: re-active images while migration
been canceled after inactive them")
- In 2023, commit 6dab4c93ec ("migration: Attempt disk reactivation in
more failure scenarios")
Now since we have a slightly better picture maybe we can unify the
reactivation in a single path.
One side benefit of doing so is, we can move the disk operation outside QMP
command "migrate_cancel". It's possible that in the future we may want to
make "migrate_cancel" be OOB-compatible, while that requires the command
doesn't need BQL in the first place. This will already do that and make
migrate_cancel command lightweight.
Problem 2: disk invalidation on top of invalidated disks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is an unresolved bug for current QEMU. Link in "Resolves:" at the
end. It turns out besides the src switchover phase (problem 1 above), QEMU
also needs to remember block activation on destination.
Consider two continuous migration in a row, where the VM was always paused.
In that scenario, the disks are not activated even until migration
completed in the 1st round. When the 2nd round starts, if QEMU doesn't
know the status of the disks, it needs to try inactivate the disk again.
Here the issue is the block layer API bdrv_inactivate_all() will crash a
QEMU if invoked on already inactive disks for the 2nd migration. For
detail, see the bug link at the end.
Implementation
==============
This patch proposes to maintain disk activation with a global flag, so we
know:
- If we used to inactivate disks for migration, but migration got
cancelled, or failed, QEMU will know it should reactivate the disks.
- On incoming side, if the disks are never activated but then another
migration is triggered, QEMU should be able to tell that inactivate is
not needed for the 2nd migration.
We used to have disk_inactive, but it only solves the 1st issue, not the
2nd. Also, it's done in completely separate paths so it's extremely hard
to follow either how the flag changes, or the duration that the flag is
valid, and when we will reactivate the disks.
Convert the existing disk_inactive flag into that global flag (also invert
its naming), and maintain the disk activation status for the whole
lifecycle of qemu. That includes the incoming QEMU.
Put both of the error cases of source migration (failure, cancelled)
together into migration_iteration_finish(), which will be invoked for
either of the scenario. So from that part QEMU should behave the same as
before. However with such global maintenance on disk activation status, we
not only cleanup quite a few temporary paths that we try to maintain the
disk activation status (e.g. in postcopy code), meanwhile it fixes the
crash for problem 2 in one shot.
For freshly started QEMU, the flag is initialized to TRUE showing that the
QEMU owns the disks by default.
For incoming migrated QEMU, the flag will be initialized to FALSE once and
for all showing that the dest QEMU doesn't own the disks until switchover.
That is guaranteed by the "once" variable.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2395
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20241206230838.1111496-7-peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Src QEMU sets block_inactive=true very early before the invalidation takes
place. It means if something wrong happened during setting the flag but
before reaching qemu_savevm_state_complete_precopy_non_iterable() where it
did the invalidation work, it'll make block_inactive flag inconsistent.
For example, think about when qemu_savevm_state_complete_precopy_iterable()
can fail: it will have block_inactive set to true even if all block drives
are active.
Fix that by only update the flag after the invalidation is done.
No Fixes for any commit, because it's not an issue if bdrv_activate_all()
is re-entrant upon all-active disks - false positive block_inactive can
bring nothing more than "trying to active the blocks but they're already
active". However let's still do it right to avoid the inconsistent flag
v.s. reality.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20241206230838.1111496-6-peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Postcopy never cared about late-block-active. However there's no mention
in the capability that it doesn't apply to postcopy.
Considering that we _assumed_ late activation is always good, do that too
for postcopy unconditionally, just like precopy. After this patch, we
should have unified the behavior across all.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20241206230838.1111496-5-peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Migration capability 'late-block-active' controls when the block drives
will be activated. If enabled, block drives will only be activated until
VM starts, either src runstate was "live" (RUNNING, or SUSPENDED), or it'll
be postponed until qmp_cont().
Let's do this unconditionally. There's no harm to delay activation of
block drives. Meanwhile there's no ABI breakage if dest does it, because
src QEMU has nothing to do with it, so it's no concern on ABI breakage.
IIUC we could avoid introducing this cap when introducing it before, but
now it's still not too late to just always do it. Cap now prone to
removal, but it'll be for later patches.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20241206230838.1111496-4-peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
As the comment says, the activation of disks is for the case where
migration has completed, rather than when QEMU is still during
migration (RUN_STATE_INMIGRATE).
Move the code over to reflect what the comment is describing.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20241206230838.1111496-3-peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
In 99% cases, after QEMU migrates to dest host, it tries to detect the
target VM runstate using global_state_get_runstate().
There's one outlier so far which is Xen that won't send global state.
That's the major reason why global_state_received() check was always there
together with global_state_get_runstate().
However it's utterly confusing why global_state_received() has anything to
do with "let's start VM or not".
Provide a helper to explain it, then we have an unified entry for getting
the target dest QEMU runstate after migration.
Suggested-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20241206230838.1111496-2-peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Commit f5f48a7891 ("migration/multifd: Separate SYNC request with
normal jobs") changed the multifd source side to stop sending data
along with the MULTIFD_FLAG_SYNC, effectively introducing the concept
of a SYNC-only packet. Relying on that, commit d7e58f412c
("migration/multifd: Don't send ram data during SYNC") later came
along and skipped reading data from SYNC packets.
In a versions timeline like this:
8.2 f5f48a7 9.0 9.1 d7e58f41 9.2
The issue arises that QEMUs < 9.0 still send data along with SYNC, but
QEMUs > 9.1 don't gather that data anymore. This leads to various
kinds of migration failures due to desync/missing data.
Stop checking for a SYNC packet on the destination and unconditionally
unfill the packet.
>From now on:
old -> new:
the source sends data + sync, destination reads normally
new -> new:
source sends only sync, destination reads zeros
new -> old:
source sends only sync, destination reads zeros
CC: qemu-stable@nongnu.org
Fixes: d7e58f412c ("migration/multifd: Don't send ram data during SYNC")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2720
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20241213160120.23880-2-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
It's not straightforward to see why src QEMU needs to sync multifd during
setup() phase. After all, there's no page queued at that point.
For old QEMUs, there's a solid reason: EOS requires it to work. While it's
clueless on the new QEMUs which do not take EOS message as sync requests.
One will figure that out only when this is conditionally removed. In fact,
the author did try it out. Logically we could still avoid doing this on
new machine types, however that needs a separate compat field and that can
be an overkill in some trivial overhead in setup() phase.
Let's instead document it completely, to avoid someone else tries this
again and do the debug one more time, or anyone confused on why this ever
existed.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20241206224755.1108686-8-peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The src flush condition check is over complicated, and it's getting more
out of control if postcopy will be involved.
In general, we have two modes to do the sync: legacy or modern ways.
Legacy uses per-section flush, modern uses per-round flush.
Mapped-ram always uses the modern, which is per-round.
Introduce two helpers, which can greatly simplify the code, and hopefully
make it readable again.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20241206224755.1108686-7-peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Multifd never worked with postcopy, at least yet so far.
Remove the sync processing there, because it's confusing, and they should
never appear. Now if RAM_SAVE_FLAG_MULTIFD_FLUSH is observed, we fail hard
instead of trying to invoke multifd code.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20241206224755.1108686-6-peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
RAM_SAVE_FLAG_MULTIFD_FLUSH message should always be correlated to a sync
request on src. Unify such message into one place, and conditionally send
the message only if necessary.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20241206224755.1108686-5-peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Firstly, we're going to use the multifd flag soon in multifd code, so ram.c
isn't gonna work.
Secondly, we have a separate RDMA flag dangling around, which is definitely
not obvious. There's one comment that helps, but not too much.
Put all RAM save flags altogether, so nothing will get overlooked.
Add a section explain why we can't use bits over 0x200.
Remove RAM_SAVE_FLAG_FULL as it's already not used in QEMU, as the comment
explained.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20241206224755.1108686-4-peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Teach multifd_send_sync_main() to sync with threads only.
We already have such requests, which is when mapped-ram is enabled with
multifd. In that case, no SYNC messages will be pushed to the stream when
multifd syncs the sender threads because there's no destination threads
waiting for that. The whole point of the sync is to make sure all threads
finished their jobs.
So fundamentally we have a request to do the sync in different ways:
- Either to sync the threads only,
- Or to sync the threads but also with the destination side.
Mapped-ram did it already because of the use_packet check in the sync
handler of the sender thread. It works.
However it may stop working when e.g. VFIO may start to reuse multifd
channels to push device states. In that case VFIO has similar request on
"thread-only sync" however we can't check a flag because such sync request
can still come from RAM which needs the on-wire notifications.
Paving way for that by allowing the multifd_send_sync_main() to specify
what kind of sync the caller needs. We can use it for mapped-ram already.
No functional change intended.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20241206224755.1108686-3-peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>