qemu/block
Eric Blake 7e277545b9 mirror: Skip writing zeroes when target is already zero
When mirroring, the goal is to ensure that the destination reads the
same as the source; this goal is met whether the destination is sparse
or fully-allocated (except when explicitly punching holes, then merely
reading zero is not enough to know if it is sparse, so we still want
to punch the hole).  Avoiding a redundant write to zero (whether in
the background because the zero cluster was marked in the dirty
bitmap, or in the foreground because the guest is writing zeroes) when
the destination already reads as zero makes mirroring faster, and
avoids allocating the destination merely because the source reports as
allocated.

The effect is especially pronounced when the source is a raw file.
That's because when the source is a qcow2 file, the dirty bitmap only
visits the portions of the source that are allocated, which tend to be
non-zero.  But when the source is a raw file,
bdrv_co_is_allocated_above() reports the entire file as allocated so
mirror_dirty_init sets the entire dirty bitmap, and it is only later
during mirror_iteration that we change to consulting the more precise
bdrv_co_block_status_above() to learn where the source reads as zero.

Remember that since a mirror operation can write a cluster more than
once (every time the guest changes the source, the destination is also
changed to keep up), and the guest can change whether a given cluster
reads as zero, is discarded, or has non-zero data over the course of
the mirror operation, we can't take the shortcut of relying on
s->target_is_zero (which is static for the life of the job) in
mirror_co_zero() to see if the destination is already zero, because
that information may be stale.  Any solution we use must be dynamic in
the face of the guest writing or discarding a cluster while the mirror
has been ongoing.

We could just teach mirror_co_zero() to do a block_status() probe of
the destination, and skip the zeroes if the destination already reads
as zero, but we know from past experience that extra block_status()
calls are not always cheap (tmpfs, anyone?), especially when they are
random access rather than linear.  Use of block_status() of the source
by the background task in a linear fashion is not our bottleneck (it's
a background task, after all); but since mirroring can be done while
the source is actively being changed, we don't want a slow
block_status() of the destination to occur on the hot path of the
guest trying to do random-access writes to the source.

So this patch takes a slightly different approach: any time we have to
track dirty clusters, we can also track which clusters are known to
read as zero.  For sync=TOP or when we are punching holes from
"detect-zeroes":"unmap", the zero bitmap starts out empty, but
prevents a second write zero to a cluster that was already zero by an
earlier pass; for sync=FULL when we are not punching holes, the zero
bitmap starts out full if the destination reads as zero during
initialization.  Either way, I/O to the destination can now avoid
redundant write zero to a cluster that already reads as zero, all
without having to do a block_status() per write on the destination.

With this patch, if I create a raw sparse destination file, connect it
with QMP 'blockdev-add' while leaving it at the default "discard":
"ignore", then run QMP 'blockdev-mirror' with "sync": "full", the
destination remains sparse rather than fully allocated.  Meanwhile, a
destination image that is already fully allocated remains so unless it
was opened with "detect-zeroes": "unmap".  And any time writing zeroes
is skipped, the job counters are not incremented.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250509204341.3553601-26-eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-14 20:27:49 -05:00
..
export block/export: Add option to allow export of inactive nodes 2025-02-06 14:46:40 +01:00
monitor nbd/server: Allow users to adjust handshake limit in QMP 2025-02-11 13:45:47 -06:00
accounting.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
aio_task.c block: Remove unused aio_task_pool_empty 2024-09-30 10:53:18 +03:00
amend.c block: Mark BlockDriver callbacks for amend job GRAPH_RDLOCK 2023-05-10 14:16:54 +02:00
backup.c blockdev-backup: Add error handling option for copy-before-write jobs 2025-05-12 18:19:31 +03:00
blkdebug.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
blkio.c include/system: Move exec/memory.h to system/memory.h 2025-04-23 14:08:21 -07:00
blklogwrites.c qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
blkreplay.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
blkverify.c qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
block-backend.c block: Remove unused blk_op_is_blocked() 2025-03-11 15:49:14 +01:00
block-copy.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
block-gen.h block-coroutine-wrapper.py: support also basic return types 2022-12-15 16:07:43 +01:00
block-ram-registrar.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
bochs.c block: Take graph lock for most of .bdrv_open 2023-11-08 17:56:18 +01:00
cloop.c block: Take graph lock for most of .bdrv_open 2023-11-08 17:56:18 +01:00
commit.c block: allow commit to unmap zero blocks 2025-05-01 12:12:19 +03:00
copy-before-write.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
copy-before-write.h blockdev-backup: Add error handling option for copy-before-write jobs 2025-05-12 18:19:31 +03:00
copy-on-read.c qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
copy-on-read.h block: Mark bdrv_(un)freeze_backing_chain() and callers GRAPH_RDLOCK 2023-11-07 19:14:19 +01:00
coroutines.h block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
create.c qemu/compiler: Absorb 'clang-tsa.h' 2025-03-06 14:21:25 +01:00
crypto.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
crypto.h block: Support detached LUKS header creation using qemu-img 2024-02-09 12:50:37 +00:00
curl.c qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
dirty-bitmap.c block: Mark bdrv_*_dirty_bitmap() and callers GRAPH_RDLOCK 2023-02-23 19:49:32 +01:00
dmg-bz2.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
dmg-lzfse.c block/dmg: Ignore C99 prototype declaration mismatch from <lzfse.h> 2023-03-30 15:03:36 +02:00
dmg.c block: Protect bs->file with graph_lock 2023-11-08 17:56:18 +01:00
dmg.h block/dmg: Declare a type definition for DMG uncompress function 2023-04-24 13:53:44 -04:00
file-posix.c file-posix, gluster: Handle zero block status hint better 2025-05-14 15:49:27 -05:00
file-win32.c qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
filter-compress.c block: Take graph lock for most of .bdrv_open 2023-11-08 17:56:18 +01:00
gluster.c file-posix, gluster: Handle zero block status hint better 2025-05-14 15:49:27 -05:00
graph-lock.c graph-lock: remove AioContext locking 2023-12-21 22:49:27 +01:00
io.c block: Add new bdrv_co_is_all_zeroes() function 2025-05-14 16:08:23 -05:00
io_uring.c file-posix: Support FUA writes 2025-03-13 17:44:55 +01:00
iscsi-opts.c modules: add block module annotations 2021-07-09 18:20:27 +02:00
iscsi.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
linux-aio.c file-posix: Support FUA writes 2025-03-13 17:44:55 +01:00
meson.build include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
mirror.c mirror: Skip writing zeroes when target is already zero 2025-05-14 20:27:49 -05:00
nbd.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
nfs.c qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
null.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
nvme.c block/nvme: Use host PCI MMIO API 2025-05-08 10:21:10 -04:00
parallels-ext.c qapi/crypto: Rename QCryptoHashAlgorithm to *Algo, and drop prefix 2024-09-10 14:02:16 +02:00
parallels.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
parallels.h block: Protect bs->file with graph_lock 2023-11-08 17:56:18 +01:00
preallocate.c block: Protect bs->file with graph_lock 2023-11-08 17:56:18 +01:00
progress_meter.c coroutine: Clean up superfluous inclusion of qemu/lockable.h 2023-01-19 10:18:28 +01:00
qapi-system.c qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
qapi.c Block layer patches 2025-02-10 13:25:36 -05:00
qcow.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
qcow2-bitmap.c block/qcow2-bitmap: Replace g_memdup() by g_memdup2() 2024-05-08 19:11:34 +02:00
qcow2-cache.c qcow2: Mark qcow2_signal_corruption() and callers GRAPH_RDLOCK 2023-10-12 16:31:33 +02:00
qcow2-cluster.c qcow2: Take locks for accessing bs->file 2023-11-08 17:56:17 +01:00
qcow2-refcount.c qcow2: Mark qcow2_signal_corruption() and callers GRAPH_RDLOCK 2023-10-12 16:31:33 +02:00
qcow2-snapshot.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
qcow2-threads.c thread-pool: avoid passing the pool parameter every time 2023-04-25 13:17:28 +02:00
qcow2.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
qcow2.h qcow2: Take locks for accessing bs->file 2023-11-08 17:56:17 +01:00
qed-check.c qed: mark more functions as coroutine_fns and GRAPH_RDLOCK 2023-06-28 09:46:20 +02:00
qed-cluster.c qed: protect table cache with CoMutex 2017-07-17 11:34:11 +08:00
qed-l2-cache.c osdep: Move memalign-related functions to their own header 2022-03-07 13:16:49 +00:00
qed-table.c block: use bdrv_co_debug_event in coroutine context 2023-06-28 09:46:34 +02:00
qed.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
qed.h block: Protect bs->file with graph_lock 2023-11-08 17:56:18 +01:00
quorum.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
raw-format.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
rbd.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
replication.c blockdev-backup: Add error handling option for copy-before-write jobs 2025-05-12 18:19:31 +03:00
reqlist.c block/reqlist: allow adding overlapping requests 2024-09-30 10:53:18 +03:00
snapshot-access.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
snapshot.c block: Zero block driver state before reopening 2025-03-11 15:49:14 +01:00
ssh.c qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
stream.c qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
throttle-groups.c qom: Make InterfaceInfo[] uses const 2025-04-25 17:00:41 +02:00
throttle.c block: Take graph lock for most of .bdrv_open 2023-11-08 17:56:18 +01:00
trace-events nbd/client: Accept 64-bit block status chunks 2023-10-05 11:02:08 -05:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
vdi.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
vhdx-endian.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
vhdx-log.c vhdx: Take locks for accessing bs->file 2023-11-08 17:56:18 +01:00
vhdx.c qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
vhdx.h vhdx: Take locks for accessing bs->file 2023-11-08 17:56:18 +01:00
vmdk.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
vpc.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
vvfat.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
win32-aio.c aio: remove aio_disable_external() API 2023-05-30 17:37:26 +02:00
write-threshold.c block: remove AioContext locking 2023-12-21 22:49:27 +01:00