qemu/util/aio-posix.h
Kevin Wolf ee416407b3 aio-posix: Separate AioPolledEvent per AioHandler
Adaptive polling has a big problem: It doesn't consider that an event
loop can wait for many different events that may have very different
typical latencies.

For example, think of a guest that tends to send a new I/O request soon
after the previous I/O request completes, but the storage on the host is
rather slow. In this case, getting the new request from guest quickly
means that polling is enabled, but the next thing is performing the I/O
request on the backend, which is slow and disables polling again for the
next guest request. This means that in such a scenario, polling could
help for every other event, but is only ever enabled when it can't
succeed.

In order to fix this, keep a separate AioPolledEvent for each
AioHandler. We will then know that the backend file descriptor always
has a high latency and isn't worth polling for, but we also know that
the guest is always fast and we should poll for it. This solves at least
half of the problem, we can now keep polling for those cases where it
makes sense and get the improved performance from it.

Since the event loop doesn't know which event will be next, we still do
some unnecessary polling while we're waiting for the slow disk. I made
some attempts to be more clever than just randomly growing and shrinking
the polling time, and even to let callers be explicit about when they
expect a new event, but so far this hasn't resulted in improved
performance or even caused performance regressions. For now, let's just
fix the part that is easy enough to fix, we can revisit the rest later.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250307221634.71951-6-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-03-13 17:57:23 +01:00

83 lines
2.1 KiB
C

/*
* AioContext POSIX event loop implementation internal APIs
*
* Copyright IBM, Corp. 2008
* Copyright Red Hat, Inc. 2020
*
* Authors:
* Anthony Liguori <aliguori@us.ibm.com>
*
* This work is licensed under the terms of the GNU GPL, version 2. See
* the COPYING file in the top-level directory.
*
* Contributions after 2012-01-13 are licensed under the terms of the
* GNU GPL, version 2 or (at your option) any later version.
*/
#ifndef AIO_POSIX_H
#define AIO_POSIX_H
#include "block/aio.h"
struct AioHandler {
GPollFD pfd;
IOHandler *io_read;
IOHandler *io_write;
AioPollFn *io_poll;
IOHandler *io_poll_ready;
IOHandler *io_poll_begin;
IOHandler *io_poll_end;
void *opaque;
QLIST_ENTRY(AioHandler) node;
QLIST_ENTRY(AioHandler) node_ready; /* only used during aio_poll() */
QLIST_ENTRY(AioHandler) node_deleted;
QLIST_ENTRY(AioHandler) node_poll;
#ifdef CONFIG_LINUX_IO_URING
QSLIST_ENTRY(AioHandler) node_submitted;
unsigned flags; /* see fdmon-io_uring.c */
#endif
int64_t poll_idle_timeout; /* when to stop userspace polling */
bool poll_ready; /* has polling detected an event? */
AioPolledEvent poll;
};
/* Add a handler to a ready list */
void aio_add_ready_handler(AioHandlerList *ready_list, AioHandler *node,
int revents);
extern const FDMonOps fdmon_poll_ops;
#ifdef CONFIG_EPOLL_CREATE1
bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd);
void fdmon_epoll_setup(AioContext *ctx);
void fdmon_epoll_disable(AioContext *ctx);
#else
static inline bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd)
{
return false;
}
static inline void fdmon_epoll_setup(AioContext *ctx)
{
}
static inline void fdmon_epoll_disable(AioContext *ctx)
{
}
#endif /* !CONFIG_EPOLL_CREATE1 */
#ifdef CONFIG_LINUX_IO_URING
bool fdmon_io_uring_setup(AioContext *ctx);
void fdmon_io_uring_destroy(AioContext *ctx);
#else
static inline bool fdmon_io_uring_setup(AioContext *ctx)
{
return false;
}
static inline void fdmon_io_uring_destroy(AioContext *ctx)
{
}
#endif /* !CONFIG_LINUX_IO_URING */
#endif /* AIO_POSIX_H */