qemu/include/block
Fiona Ebner 91ba0e1c38 block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK
This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.

Note that even if bdrv_drained_begin() were already marked as
GRAPH_UNLOCKED, TSA would not complain about the instance in
bdrv_change_aio_context() before this change, because it is preceded
by a bdrv_graph_rdunlock_main_loop() call. It is not correct to
release the lock here, and in case the caller holds a write lock, it
wouldn't actually release the lock.

In combination with block-stream, there is a deadlock that can happen
because of this [0]. In particular, it can happen that
main thread              IO thread
1. acquires write lock
                         in blk_co_do_preadv_part():
                         2. have non-zero blk->in_flight
                         3. try to acquire read lock
4. begin drain

Steps 3 and 4 might be switched. Draining will poll and get stuck,
because it will see the non-zero in_flight counter. But the IO thread
will not make any progress either, because it cannot acquire the read
lock.

After this change, all paths to bdrv_change_aio_context() drain:
bdrv_change_aio_context() is called by:
1. bdrv_child_cb_change_aio_ctx() which is only called via the
   change_aio_ctx() callback, see below.
2. bdrv_child_change_aio_context(), see below.
3. bdrv_try_change_aio_context(), where a drained section is
   introduced.

The change_aio_ctx() callback is called by:
1. bdrv_attach_child_common_abort(), where a drained section is
   introduced.
2. bdrv_attach_child_common(), where a drained section is introduced.
3. bdrv_parent_change_aio_context(), see below.

bdrv_child_change_aio_context() is called by:
1. bdrv_change_aio_context(), i.e. recursive, so being in a drained
   section is invariant.
2. child_job_change_aio_ctx(), which is only called via the
   change_aio_ctx() callback, see above.

bdrv_parent_change_aio_context() is called by:
1. bdrv_change_aio_context(), i.e. recursive, so being in a drained
   section is invariant.

This resolves all code paths. Note that bdrv_attach_child_common()
and bdrv_attach_child_common_abort() hold the graph write lock and
callers of bdrv_try_change_aio_context() might too, so they are not
actually allowed to drain either. This will be addressed in the
following commits.

More granular draining is not trivially possible, because
bdrv_change_aio_context() can recursively call itself e.g. via
bdrv_child_change_aio_context().

[0]: https://lore.kernel.org/qemu-devel/73839c04-7616-407e-b057-80ca69e63f51@virtuozzo.com/

Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-9-f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04 18:16:33 +02:00
..
accounting.h block: add accounting for zone append operation 2023-05-15 08:18:10 -04:00
aio-wait.h system/cpus: rename qemu_mutex_lock_iothread() to bql_lock() 2024-01-08 10:45:43 -05:00
aio.h aio-posix: Separate AioPolledEvent per AioHandler 2025-03-13 17:57:23 +01:00
aio_task.h block: Remove unused aio_task_pool_empty 2024-09-30 10:53:18 +03:00
block-common.h block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
block-copy.h copy-before-write: allow specifying minimum cluster size 2024-09-30 10:52:41 +03:00
block-global-state.h block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK 2025-06-04 18:16:33 +02:00
block-hmp-cmds.h include/block: Untangle inclusion loops 2023-01-20 07:24:28 +01:00
block-io.h block: Add new bdrv_co_is_all_zeroes() function 2025-05-14 16:08:23 -05:00
block.h include/block: Untangle inclusion loops 2023-01-20 07:24:28 +01:00
block_backup.h include/block: Untangle inclusion loops 2023-01-20 07:24:28 +01:00
block_int-common.h block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK 2025-06-04 18:16:33 +02:00
block_int-global-state.h mirror: Drop redundant zero_target parameter 2025-05-14 20:10:12 -05:00
block_int-io.h block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
block_int.h include/block: Untangle inclusion loops 2023-01-20 07:24:28 +01:00
blockjob.h Rename "QEMU global mutex" to "BQL" in comments and docs 2024-01-08 10:45:43 -05:00
blockjob_int.h block: Mark block_job_add_bdrv() GRAPH_WRLOCK 2023-11-07 19:14:19 +01:00
dirty-bitmap.h block: Mark bdrv_*_dirty_bitmap() and callers GRAPH_RDLOCK 2023-02-23 19:49:32 +01:00
export.h block/export: Add option to allow export of inactive nodes 2025-02-06 14:46:40 +01:00
fuse.h fuse: Allow exporting BDSs via FUSE 2020-12-11 17:52:39 +01:00
graph-lock.h qemu/compiler: Absorb 'clang-tsa.h' 2025-03-06 14:21:25 +01:00
nbd.h nbd/server: Allow users to adjust handshake limit in QMP 2025-02-11 13:45:47 -06:00
nvme.h hw/nvme: set error status code explicitly for misc commands 2025-02-26 12:40:35 +01:00
qapi.h block: Mark bdrv_get_parent_name() and callers GRAPH_RDLOCK 2023-10-12 16:31:33 +02:00
qdict.h qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
raw-aio.h file-posix: Support FUA writes 2025-03-13 17:44:55 +01:00
replication.h replication: move include out of root directory 2021-05-26 14:49:46 +02:00
reqlist.h block/reqlist: add reqlist_wait_all() 2022-03-07 09:33:30 +01:00
snapshot.h block: remove AioContext locking 2023-12-21 22:49:27 +01:00
thread-pool.h thread-pool: Implement generic (non-AIO) pool support 2025-03-06 06:47:33 +01:00
throttle-groups.h block/throttle-groups: Use ThrottleDirection instread of bool is_write 2023-08-29 10:49:24 +02:00
ufs.h hw/ufs: Add temperature event notification support 2025-03-05 02:13:29 +01:00
write-threshold.h block: Clean up includes 2023-02-08 07:28:05 +01:00