Commit graph

2269 commits

Author SHA1 Message Date
Eric Blake
6dab4c93ec migration: Attempt disk reactivation in more failure scenarios
Commit fe904ea824 added a fail_inactivate label, which tries to
reactivate disks on the source after a failure while s->state ==
MIGRATION_STATUS_ACTIVE, but didn't actually use the label if
qemu_savevm_state_complete_precopy() failed.  This failure to
reactivate is also present in commit 6039dd5b1c (also covering the new
s->state == MIGRATION_STATUS_DEVICE state) and 403d18ae (ensuring
s->block_inactive is set more reliably).

Consolidate the two labels back into one - no matter HOW migration is
failed, if there is any chance we can reach vm_start() after having
attempted inactivation, it is essential that we have tried to restart
disks before then.  This also makes the cleanup more like
migrate_fd_cancel().

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20230502205212.134680-1-eblake@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10 14:16:53 +02:00
Lukas Straub
c323518a7a migration: Initialize and cleanup decompression in migration.c
This fixes compress with colo.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:27 +02:00
Lukas Straub
52623f23b0 ram-compress.c: Make target independent
Make ram-compress.c target independent.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
4024cc8506 ram compress: Assert that the file buffer matches the result
Before this series, "nothing to send" was handled by the file buffer
being empty. Now it is tracked via param->result.

Assert that the file buffer state matches the result.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
b1f17720c1 ram.c: Move core decompression code into its own file
No functional changes intended.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
b5ca3368d9 ram.c: Move core compression code into its own file
No functional changes intended.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
ef4f5f5d5a ram.c: Remove last ram.c dependency from the core compress code
Make compression interfaces take send_queued_data() as an argument.
Remove save_page_use_compression() from flush_compressed_data().

This removes the last ram.c dependency from the core compress code.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
680628d200 ram.c: Call update_compress_thread_counts from compress_send_queued_data
This makes the core compress code more independend from ram.c.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
3e81763e4c ram.c: Do not call save_page_header() from compress threads
save_page_header() accesses several global variables, so calling it
from multiple threads is pretty ugly.

Instead, call save_page_header() before writing out the compressed
data from the compress buffer to the migration stream.

This also makes the core compress code more independend from ram.c.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
b5cf1cd3e8 ram.c: Reset result after sending queued data
And take the param->mutex lock for the whole section to ensure
thread-safety.
Now, it is explicitly clear if there is no queued data to send.
Before, this was handled by param->file stream being empty and thus
qemu_put_qemu_file() not sending anything.

This will be used in the next commits to move save_page_header()
out of compress code.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
10c2f7b747 ram.c: Dont change param->block in the compress thread
Instead introduce a extra parameter to trigger the compress thread.
Now, when the compress thread is done, we know what RAMBlock and
offset it did compress.

This will be used in the next commits to move save_page_header()
out of compress code.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Lukas Straub
97274a871f ram.c: Let the compress threads return a CompressResult enum
This will be used in the next commits to move save_page_header()
out of compress code.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-08 15:25:26 +02:00
Juan Quintela
fae4009fb5 qemu-file: Make ram_control_save_page() use accessors for rate_limit
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230504113841.23130-9-quintela@redhat.com>
2023-05-05 02:01:59 +02:00
Juan Quintela
61abf1ebdc qemu-file: Make total_transferred an uint64_t
Change all the functions that use it.  It was already passed as
uint64_t.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-8-quintela@redhat.com>
2023-05-05 02:01:59 +02:00
Juan Quintela
ac7d25b816 qemu-file: remove shutdown member
The first thing that we do after setting the shutdown value is set the
error as -EIO if there is not a previous error.

So this value is redundant.  Just remove it and use
qemu_file_get_error() in the places that it was tested.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230504113841.23130-7-quintela@redhat.com>
2023-05-05 02:01:59 +02:00
Juan Quintela
27a1243f14 qemu-file: No need to check for shutdown in qemu_file_rate_limit
After calling qemu_file_shutdown() we set the error as -EIO if there
is no another previous error, so no need to check it here.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-6-quintela@redhat.com>
2023-05-05 02:01:59 +02:00
Juan Quintela
f3030d3440 migration: qemu_file_total_transferred() function is monotonic
So delta_bytes can only be greater or equal to zero.  Never negative.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-3-quintela@redhat.com>
2023-05-05 01:04:33 +02:00
Juan Quintela
520333490a migration: max_postcopy_bandwidth is a size parameter
So make everything that uses it uint64_t no int64_t.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504113841.23130-2-quintela@redhat.com>
2023-05-05 01:04:33 +02:00
Juan Quintela
cd01a60231 migration/rdma: Check for postcopy sooner
It makes no sense first try to see if there is an rdma error and then
do nothing on postcopy stage.  Change it so we check we are in
postcopy before doing anything.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-6-quintela@redhat.com>
2023-05-05 01:04:33 +02:00
Juan Quintela
8c90815797 migration/rdma: It makes no sense to recive that flag without RDMA
This could only happen if the source sent
RAM_SAVE_FLAG_HOOK (i.e. rdma) and destination don't have CONFIG_RDMA.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-5-quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
93dc710585 migration/rdma: We can calculate the rioc from the QEMUFile
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-4-quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
cf7fe0c5b0 migration/rdma: simplify ram_control_load_hook()
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-3-quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
5f1e7540b4 migration: Make RAM_SAVE_FLAG_HOOK a normal case entry
Fixes this commit, clearly a bad merge after a rebase or similar, it
should have been its own case since that point.

commit 5b0e9dd46f
Author: Peter Lieven <pl@kamp.de>
Date:   Tue Jun 24 11:32:36 2014 +0200

    migration: catch unknown flag combinations in ram_load

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230504114443.23891-2-quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
f3095cc8a7 migration: Rename xbzrle_enabled xbzrle_started
Otherwise it is confusing with the function xbzrle_enabled().

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230504115323.24407-1-quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
40f240a764 migration: Put zero_pages in alphabetical order
I forgot to move it when I rename it from duplicated_pages.

Message-Id: <20230504103357.22130-3-quintela@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
e2ee200558 migration: Document all migration_stats
Message-Id: <20230504103357.22130-2-quintela@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
3ec6828a79 migration/rdma: Don't pass the QIOChannelRDMA as an opaque
We can calculate it from the QEMUFile like the caller.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230503131847.11603-6-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
3cba22c9ad migration: Fix block_bitmap_mapping migration
It is valid that params->has_block_bitmap_mapping is true and
params->block_bitmap_mapping is NULL.  So we can't use the trick of
having a single function.

Move to two functions one for each value and the tests are fixed.

Fixes: b804b35b1c
       migration: Create migrate_block_bitmap_mapping() function

Reported-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20230503181036.14890-1-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-05 01:04:32 +02:00
Juan Quintela
0deb7e9b6c migration: Drop unused parameter for migration_tls_client_create()
It is not needed since we moved the accessor for tls properties to
options.c.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-05-03 11:24:20 +02:00
Juan Quintela
3f461a0c0b migration: Drop unused parameter for migration_tls_get_creds()
It is not needed since we moved the accessor for tls properties to
options.c.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-05-03 11:24:20 +02:00
Juan Quintela
5690756d7c migration/rdma: Unfold last user of acct_update_position()
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
2023-05-03 11:24:20 +02:00
Juan Quintela
c61d2faa93 migration/rdma: Split the zero page case from acct_update_position
Now that we have atomic counters, we can do it on the place that we
need it, no need to do it inside ram.c.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
2023-05-03 11:24:20 +02:00
Juan Quintela
96820df24e migration: Rename RAMStats to MigrationAtomicStats
It is lousely based on MigrationStats, but that name is taken, so this
is the best one that I came with.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>

---

If you have any good suggestion for the name, I am all ears.
2023-05-03 11:24:20 +02:00
Juan Quintela
aff3f6606d migration: Rename ram_counters to mig_stats
migration_stats is just too long, and it is going to have more than
ram counters in the near future.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
2023-05-03 11:24:20 +02:00
Juan Quintela
947701cc1a migration: Move ram_stats to its own file migration-stats.[ch]
There is already include/qemu/stats.h, so stats.h was a bad idea.
We want this file to not depend on anything else, we will move all the
migration counters/stats to this struct.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
2023-05-03 11:24:19 +02:00
Juan Quintela
e232199aad multifd: We already account for this packet on the multifd thread
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
2023-05-03 11:24:19 +02:00
Richard Henderson
dc165fcd4e migration/xbzrle: Use __attribute__((target)) for avx512
Use the attribute, which is supported by clang, instead of
the #pragma, which is not supported and, for some reason,
also not detected by the meson probe, so we fail by -Werror.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230501210555.289806-1-richard.henderson@linaro.org>
2023-05-02 13:05:45 -07:00
Juan Quintela
73208a336e migration: Make dirty_bytes_last_sync atomic
As we set its value, it needs to be operated with atomics.
We rename it from remaining to better reflect its meaning.

Statistics always return the real reamaining bytes.  This was used to
store how much pages where dirty on the previous generation, so we can
calculate the expected downtime as: dirty_bytes_last_sync /
current_bandwith.

If we use the actual remaining bytes, we would see a very small value
at the end of the iteration.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

---

I am open to use ram_bytes_remaining() in its only use and be more
"optimistic" about the downtime.

Don't use __nocheck() functions.
Use stat64_get() now that it exists.
2023-04-27 16:39:54 +02:00
Juan Quintela
72f8e58707 migration: Make dirty_pages_rate atomic
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>

---

Don't use __nocheck() variants
Use stat64_get()
2023-04-27 16:39:49 +02:00
Juan Quintela
294e5a4034 multifd: Only flush once each full round of memory
We need to add a new flag to mean to flush at that point.
Notice that we still flush at the end of setup and at the end of
complete stages.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>

---

Add missing qemu_fflush(), now it passes all tests always.
In the previous version, the check that changes the default value to
false got lost in some rebase.  Get it back.
2023-04-27 16:37:28 +02:00
Juan Quintela
b05292c237 multifd: Protect multifd_send_sync_main() calls
We only need to do that on the ram_save_iterate() call on sending and
on destination when we get a RAM_SAVE_FLAG_EOS.

In setup() and complete() we need to synch in both new and old cases,
so don't add a check there.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>

---

Remove the wrappers that we take out on patch 5.
2023-04-27 16:37:28 +02:00
Juan Quintela
77c259a4cb multifd: Create property multifd-flush-after-each-section
We used to flush all channels at the end of each RAM section
sent.  That is not needed, so preparing to only flush after a full
iteration through all the RAM.

Default value of the property is false.  But we return "true" in
migrate_multifd_flush_after_each_section() until we implement the code
in following patches.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>

---

Rename each-iteration to after-each-section
Rename multifd-sync-after-each-section to
       multifd-flush-after-each-section
Move to machine-8.0 (peter)
2023-04-27 16:37:28 +02:00
Juan Quintela
f9436522c8 migration: Move migration_properties to options.c
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-27 16:37:28 +02:00
Juan Quintela
b804b35b1c migration: Create migrate_block_bitmap_mapping() function
Notice that we changed the test of ->has_block_bitmap_mapping
for the test that block_bitmap_mapping is not NULL.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

---

Make it return const (vladimir)
2023-04-27 16:37:28 +02:00
Juan Quintela
1f2f366c32 migration: Create migrate_tls_hostname() function
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

---

Moved the type to const char * (vladimir)
2023-04-27 16:37:28 +02:00
Juan Quintela
2eb0308bbd migration: Create migrate_tls_authz() function
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

---

Moved the type to const char * (vladimir)
2023-04-27 16:37:28 +02:00
Juan Quintela
d5c3e1959c migration: Create migrate_tls_creds() function
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

---

Moved the type to const char * (vladimir)
2023-04-27 16:37:28 +02:00
Juan Quintela
b1a8795654 migration: Remove MigrationState from block_cleanup_parameters()
This makes the function more regular with everything else.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-27 16:37:28 +02:00
Juan Quintela
b7b73122dd migration: Move block_cleanup_parameters() to options.c
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-27 16:37:28 +02:00
Juan Quintela
87c2290109 migration: Move migrate_set_block_incremental() to options.c
Once there, make it more regular and remove the need for
MigrationState parameter.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-27 16:37:28 +02:00