mirror of
https://github.com/Motorhead1991/qemu.git
synced 2025-08-04 08:13:54 -06:00
pci, pc, virtio: features, fixes, cleanups
intel-iommu scalable option pcie acs emulation beginning for vhost-user-blk reconnect and of vhost-user backend work misc fixes and cleanups Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJciHBSAAoJECgfDbjSjVRpoxkH/2NvGGZo+fSAIjVcEOe9BKZx XeI4X51QnqOqur3GktoHQzpMYCGxYy653AE69aoO1JVOXsoJS2py0SKw5VIa9bnh BeZwXGmf1/rySC+iFc5oSNxHv7vS2o40ccwrkeKoqbbzrnLPIYQs/yyfJG/m0HtS xj0zSN6rTY8xxiJYVQftav3ylqInIr3d14WoJcIP3ksiOVtuQ1yjDJnJdKCZvLMk 4dtFuQJpownQrOZ0jfXXvpWu2VUC2ZuBd4ylTK3IiqBRjfaU4/wIq6ySMsU1evLy chcAykqY0jt5nz339K2HgquUtcuE3LsKi3igqTZMKi2vb3SLQFnPBO0DUyjXvGg= =gusE -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging pci, pc, virtio: features, fixes, cleanups intel-iommu scalable option pcie acs emulation beginning for vhost-user-blk reconnect and of vhost-user backend work misc fixes and cleanups Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Wed 13 Mar 2019 02:52:02 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (26 commits) i386, acpi: check acpi_memory_hotplug capacity in pre_plug gen_pcie_root_port: Add ACS (Access Control Services) capability pcie: Add a simple PCIe ACS (Access Control Services) helper function vhost-user-blk: Add support to get/set inflight buffer libvhost-user: Support tracking inflight I/O in shared memory libvhost-user: Introduce vu_queue_map_desc() libvhost-user: Remove unnecessary FD flag check for event file descriptors vhost-user: Support transferring inflight buffer between qemu and backend nvdimm: use NVDIMM_ACPI_IO_LEN for the proper IO size nvdimm: use *function* directly instead of allocating it again nvdimm: fix typo in nvdimm_build_nvdimm_devices argument intel_iommu: add scalable-mode option to make scalable mode work intel_iommu: add 256 bits qi_desc support intel_iommu: scalable mode emulation libvhost-user: add vu_queue_unpop() libvhost-user-glib: export vug_source_new() vhost-user: split vhost_user_read() vhost-user: wrap some read/write with retry handling libvhost-user: exit by default on VHOST_USER_NONE vhost-user: simplify vhost_user_init/vhost_user_cleanup ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This commit is contained in:
commit
3b5b6e9b51
37 changed files with 2128 additions and 283 deletions
232
docs/interop/vhost-user.json
Normal file
232
docs/interop/vhost-user.json
Normal file
|
@ -0,0 +1,232 @@
|
|||
# -*- Mode: Python -*-
|
||||
#
|
||||
# Copyright (C) 2018 Red Hat, Inc.
|
||||
#
|
||||
# Authors:
|
||||
# Marc-André Lureau <marcandre.lureau@redhat.com>
|
||||
#
|
||||
# This work is licensed under the terms of the GNU GPL, version 2 or
|
||||
# later. See the COPYING file in the top-level directory.
|
||||
|
||||
##
|
||||
# = vhost user backend discovery & capabilities
|
||||
##
|
||||
|
||||
##
|
||||
# @VHostUserBackendType:
|
||||
#
|
||||
# List the various vhost user backend types.
|
||||
#
|
||||
# @9p: 9p virtio console
|
||||
# @balloon: virtio balloon
|
||||
# @block: virtio block
|
||||
# @caif: virtio caif
|
||||
# @console: virtio console
|
||||
# @crypto: virtio crypto
|
||||
# @gpu: virtio gpu
|
||||
# @input: virtio input
|
||||
# @net: virtio net
|
||||
# @rng: virtio rng
|
||||
# @rpmsg: virtio remote processor messaging
|
||||
# @rproc-serial: virtio remoteproc serial link
|
||||
# @scsi: virtio scsi
|
||||
# @vsock: virtio vsock transport
|
||||
#
|
||||
# Since: 4.0
|
||||
##
|
||||
{
|
||||
'enum': 'VHostUserBackendType',
|
||||
'data': [
|
||||
'9p',
|
||||
'balloon',
|
||||
'block',
|
||||
'caif',
|
||||
'console',
|
||||
'crypto',
|
||||
'gpu',
|
||||
'input',
|
||||
'net',
|
||||
'rng',
|
||||
'rpmsg',
|
||||
'rproc-serial',
|
||||
'scsi',
|
||||
'vsock'
|
||||
]
|
||||
}
|
||||
|
||||
##
|
||||
# @VHostUserBackendInputFeature:
|
||||
#
|
||||
# List of vhost user "input" features.
|
||||
#
|
||||
# @evdev-path: The --evdev-path command line option is supported.
|
||||
# @no-grab: The --no-grab command line option is supported.
|
||||
#
|
||||
# Since: 4.0
|
||||
##
|
||||
{
|
||||
'enum': 'VHostUserBackendInputFeature',
|
||||
'data': [ 'evdev-path', 'no-grab' ]
|
||||
}
|
||||
|
||||
##
|
||||
# @VHostUserBackendCapabilitiesInput:
|
||||
#
|
||||
# Capabilities reported by vhost user "input" backends
|
||||
#
|
||||
# @features: list of supported features.
|
||||
#
|
||||
# Since: 4.0
|
||||
##
|
||||
{
|
||||
'struct': 'VHostUserBackendCapabilitiesInput',
|
||||
'data': {
|
||||
'features': [ 'VHostUserBackendInputFeature' ]
|
||||
}
|
||||
}
|
||||
|
||||
##
|
||||
# @VHostUserBackendGPUFeature:
|
||||
#
|
||||
# List of vhost user "gpu" features.
|
||||
#
|
||||
# @render-node: The --render-node command line option is supported.
|
||||
# @virgl: The --virgl command line option is supported.
|
||||
#
|
||||
# Since: 4.0
|
||||
##
|
||||
{
|
||||
'enum': 'VHostUserBackendGPUFeature',
|
||||
'data': [ 'render-node', 'virgl' ]
|
||||
}
|
||||
|
||||
##
|
||||
# @VHostUserBackendCapabilitiesGPU:
|
||||
#
|
||||
# Capabilities reported by vhost user "gpu" backends.
|
||||
#
|
||||
# @features: list of supported features.
|
||||
#
|
||||
# Since: 4.0
|
||||
##
|
||||
{
|
||||
'struct': 'VHostUserBackendCapabilitiesGPU',
|
||||
'data': {
|
||||
'features': [ 'VHostUserBackendGPUFeature' ]
|
||||
}
|
||||
}
|
||||
|
||||
##
|
||||
# @VHostUserBackendCapabilities:
|
||||
#
|
||||
# Capabilities reported by vhost user backends.
|
||||
#
|
||||
# @type: The vhost user backend type.
|
||||
#
|
||||
# Since: 4.0
|
||||
##
|
||||
{
|
||||
'union': 'VHostUserBackendCapabilities',
|
||||
'base': { 'type': 'VHostUserBackendType' },
|
||||
'discriminator': 'type',
|
||||
'data': {
|
||||
'input': 'VHostUserBackendCapabilitiesInput',
|
||||
'gpu': 'VHostUserBackendCapabilitiesGPU'
|
||||
}
|
||||
}
|
||||
|
||||
##
|
||||
# @VhostUserBackend:
|
||||
#
|
||||
# Describes a vhost user backend to management software.
|
||||
#
|
||||
# It is possible for multiple @VhostUserBackend elements to match the
|
||||
# search criteria of management software. Applications thus need rules
|
||||
# to pick one of the many matches, and users need the ability to
|
||||
# override distro defaults.
|
||||
#
|
||||
# It is recommended to create vhost user backend JSON files (each
|
||||
# containing a single @VhostUserBackend root element) with a
|
||||
# double-digit prefix, for example "50-qemu-gpu.json",
|
||||
# "50-crosvm-gpu.json", etc, so they can be sorted in predictable
|
||||
# order. The backend JSON files should be searched for in three
|
||||
# directories:
|
||||
#
|
||||
# - /usr/share/qemu/vhost-user -- populated by distro-provided
|
||||
# packages (XDG_DATA_DIRS covers
|
||||
# /usr/share by default),
|
||||
#
|
||||
# - /etc/qemu/vhost-user -- exclusively for sysadmins' local additions,
|
||||
#
|
||||
# - $XDG_CONFIG_HOME/qemu/vhost-user -- exclusively for per-user local
|
||||
# additions (XDG_CONFIG_HOME
|
||||
# defaults to $HOME/.config).
|
||||
#
|
||||
# Top-down, the list of directories goes from general to specific.
|
||||
#
|
||||
# Management software should build a list of files from all three
|
||||
# locations, then sort the list by filename (i.e., basename
|
||||
# component). Management software should choose the first JSON file on
|
||||
# the sorted list that matches the search criteria. If a more specific
|
||||
# directory has a file with same name as a less specific directory,
|
||||
# then the file in the more specific directory takes effect. If the
|
||||
# more specific file is zero length, it hides the less specific one.
|
||||
#
|
||||
# For example, if a distro ships
|
||||
#
|
||||
# - /usr/share/qemu/vhost-user/50-qemu-gpu.json
|
||||
#
|
||||
# - /usr/share/qemu/vhost-user/50-crosvm-gpu.json
|
||||
#
|
||||
# then the sysadmin can prevent the default QEMU being used at all with
|
||||
#
|
||||
# $ touch /etc/qemu/vhost-user/50-qemu-gpu.json
|
||||
#
|
||||
# The sysadmin can replace/alter the distro default OVMF with
|
||||
#
|
||||
# $ vim /etc/qemu/vhost-user/50-qemu-gpu.json
|
||||
#
|
||||
# or they can provide a parallel QEMU GPU with higher priority
|
||||
#
|
||||
# $ vim /etc/qemu/vhost-user/10-qemu-gpu.json
|
||||
#
|
||||
# or they can provide a parallel OVMF with lower priority
|
||||
#
|
||||
# $ vim /etc/qemu/vhost-user/99-qemu-gpu.json
|
||||
#
|
||||
# @type: The vhost user backend type.
|
||||
#
|
||||
# @description: Provides a human-readable description of the backend.
|
||||
# Management software may or may not display @description.
|
||||
#
|
||||
# @binary: Absolute path to the backend binary.
|
||||
#
|
||||
# @tags: An optional list of auxiliary strings associated with the
|
||||
# backend for which @description is not appropriate, due to the
|
||||
# latter's possible exposure to the end-user. @tags serves
|
||||
# development and debugging purposes only, and management
|
||||
# software shall explicitly ignore it.
|
||||
#
|
||||
# Since: 4.0
|
||||
#
|
||||
# Example:
|
||||
#
|
||||
# {
|
||||
# "description": "QEMU vhost-user-gpu",
|
||||
# "type": "gpu",
|
||||
# "binary": "/usr/libexec/qemu/vhost-user-gpu",
|
||||
# "tags": [
|
||||
# "CONFIG_OPENGL_DMABUF=y"
|
||||
# ]
|
||||
# }
|
||||
#
|
||||
##
|
||||
{
|
||||
'struct' : 'VhostUserBackend',
|
||||
'data' : {
|
||||
'description': 'str',
|
||||
'type': 'VHostUserBackendType',
|
||||
'binary': 'str',
|
||||
'*tags': [ 'str' ]
|
||||
}
|
||||
}
|
|
@ -17,8 +17,13 @@ The protocol defines 2 sides of the communication, master and slave. Master is
|
|||
the application that shares its virtqueues, in our case QEMU. Slave is the
|
||||
consumer of the virtqueues.
|
||||
|
||||
In the current implementation QEMU is the Master, and the Slave is intended to
|
||||
be a software Ethernet switch running in user space, such as Snabbswitch.
|
||||
In the current implementation QEMU is the Master, and the Slave is the
|
||||
external process consuming the virtio queues, for example a software
|
||||
Ethernet switch running in user space, such as Snabbswitch, or a block
|
||||
device backend processing read & write to a virtual disk. In order to
|
||||
facilitate interoperability between various backend implementations,
|
||||
it is recommended to follow the "Backend program conventions"
|
||||
described in this document.
|
||||
|
||||
Master and slave can be either a client (i.e. connecting) or server (listening)
|
||||
in the socket communication.
|
||||
|
@ -142,6 +147,17 @@ Depending on the request type, payload can be:
|
|||
Offset: a 64-bit offset of this area from the start of the
|
||||
supplied file descriptor
|
||||
|
||||
* Inflight description
|
||||
-----------------------------------------------------
|
||||
| mmap size | mmap offset | num queues | queue size |
|
||||
-----------------------------------------------------
|
||||
|
||||
mmap size: a 64-bit size of area to track inflight I/O
|
||||
mmap offset: a 64-bit offset of this area from the start
|
||||
of the supplied file descriptor
|
||||
num queues: a 16-bit number of virtqueues
|
||||
queue size: a 16-bit size of virtqueues
|
||||
|
||||
In QEMU the vhost-user message is implemented with the following struct:
|
||||
|
||||
typedef struct VhostUserMsg {
|
||||
|
@ -157,6 +173,7 @@ typedef struct VhostUserMsg {
|
|||
struct vhost_iotlb_msg iotlb;
|
||||
VhostUserConfig config;
|
||||
VhostUserVringArea area;
|
||||
VhostUserInflight inflight;
|
||||
};
|
||||
} QEMU_PACKED VhostUserMsg;
|
||||
|
||||
|
@ -175,6 +192,7 @@ the ones that do:
|
|||
* VHOST_USER_GET_PROTOCOL_FEATURES
|
||||
* VHOST_USER_GET_VRING_BASE
|
||||
* VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
|
||||
* VHOST_USER_GET_INFLIGHT_FD (if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)
|
||||
|
||||
[ Also see the section on REPLY_ACK protocol extension. ]
|
||||
|
||||
|
@ -188,6 +206,7 @@ in the ancillary data:
|
|||
* VHOST_USER_SET_VRING_CALL
|
||||
* VHOST_USER_SET_VRING_ERR
|
||||
* VHOST_USER_SET_SLAVE_REQ_FD
|
||||
* VHOST_USER_SET_INFLIGHT_FD (if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)
|
||||
|
||||
If Master is unable to send the full message or receives a wrong reply it will
|
||||
close the connection. An optional reconnection mechanism can be implemented.
|
||||
|
@ -382,6 +401,256 @@ If VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD protocol feature is negotiated,
|
|||
slave can send file descriptors (at most 8 descriptors in each message)
|
||||
to master via ancillary data using this fd communication channel.
|
||||
|
||||
Inflight I/O tracking
|
||||
---------------------
|
||||
|
||||
To support reconnecting after restart or crash, slave may need to resubmit
|
||||
inflight I/Os. If virtqueue is processed in order, we can easily achieve
|
||||
that by getting the inflight descriptors from descriptor table (split virtqueue)
|
||||
or descriptor ring (packed virtqueue). However, it can't work when we process
|
||||
descriptors out-of-order because some entries which store the information of
|
||||
inflight descriptors in available ring (split virtqueue) or descriptor
|
||||
ring (packed virtqueue) might be overrided by new entries. To solve this
|
||||
problem, slave need to allocate an extra buffer to store this information of inflight
|
||||
descriptors and share it with master for persistent. VHOST_USER_GET_INFLIGHT_FD and
|
||||
VHOST_USER_SET_INFLIGHT_FD are used to transfer this buffer between master
|
||||
and slave. And the format of this buffer is described below:
|
||||
|
||||
-------------------------------------------------------
|
||||
| queue0 region | queue1 region | ... | queueN region |
|
||||
-------------------------------------------------------
|
||||
|
||||
N is the number of available virtqueues. Slave could get it from num queues
|
||||
field of VhostUserInflight.
|
||||
|
||||
For split virtqueue, queue region can be implemented as:
|
||||
|
||||
typedef struct DescStateSplit {
|
||||
/* Indicate whether this descriptor is inflight or not.
|
||||
* Only available for head-descriptor. */
|
||||
uint8_t inflight;
|
||||
|
||||
/* Padding */
|
||||
uint8_t padding[5];
|
||||
|
||||
/* Maintain a list for the last batch of used descriptors.
|
||||
* Only available when batching is used for submitting */
|
||||
uint16_t next;
|
||||
|
||||
/* Used to preserve the order of fetching available descriptors.
|
||||
* Only available for head-descriptor. */
|
||||
uint64_t counter;
|
||||
} DescStateSplit;
|
||||
|
||||
typedef struct QueueRegionSplit {
|
||||
/* The feature flags of this region. Now it's initialized to 0. */
|
||||
uint64_t features;
|
||||
|
||||
/* The version of this region. It's 1 currently.
|
||||
* Zero value indicates an uninitialized buffer */
|
||||
uint16_t version;
|
||||
|
||||
/* The size of DescStateSplit array. It's equal to the virtqueue
|
||||
* size. Slave could get it from queue size field of VhostUserInflight. */
|
||||
uint16_t desc_num;
|
||||
|
||||
/* The head of list that track the last batch of used descriptors. */
|
||||
uint16_t last_batch_head;
|
||||
|
||||
/* Store the idx value of used ring */
|
||||
uint16_t used_idx;
|
||||
|
||||
/* Used to track the state of each descriptor in descriptor table */
|
||||
DescStateSplit desc[0];
|
||||
} QueueRegionSplit;
|
||||
|
||||
To track inflight I/O, the queue region should be processed as follows:
|
||||
|
||||
When receiving available buffers from the driver:
|
||||
|
||||
1. Get the next available head-descriptor index from available ring, i
|
||||
|
||||
2. Set desc[i].counter to the value of global counter
|
||||
|
||||
3. Increase global counter by 1
|
||||
|
||||
4. Set desc[i].inflight to 1
|
||||
|
||||
When supplying used buffers to the driver:
|
||||
|
||||
1. Get corresponding used head-descriptor index, i
|
||||
|
||||
2. Set desc[i].next to last_batch_head
|
||||
|
||||
3. Set last_batch_head to i
|
||||
|
||||
4. Steps 1,2,3 may be performed repeatedly if batching is possible
|
||||
|
||||
5. Increase the idx value of used ring by the size of the batch
|
||||
|
||||
6. Set the inflight field of each DescStateSplit entry in the batch to 0
|
||||
|
||||
7. Set used_idx to the idx value of used ring
|
||||
|
||||
When reconnecting:
|
||||
|
||||
1. If the value of used_idx does not match the idx value of used ring (means
|
||||
the inflight field of DescStateSplit entries in last batch may be incorrect),
|
||||
|
||||
(a) Subtract the value of used_idx from the idx value of used ring to get
|
||||
last batch size of DescStateSplit entries
|
||||
|
||||
(b) Set the inflight field of each DescStateSplit entry to 0 in last batch
|
||||
list which starts from last_batch_head
|
||||
|
||||
(c) Set used_idx to the idx value of used ring
|
||||
|
||||
2. Resubmit inflight DescStateSplit entries in order of their counter value
|
||||
|
||||
For packed virtqueue, queue region can be implemented as:
|
||||
|
||||
typedef struct DescStatePacked {
|
||||
/* Indicate whether this descriptor is inflight or not.
|
||||
* Only available for head-descriptor. */
|
||||
uint8_t inflight;
|
||||
|
||||
/* Padding */
|
||||
uint8_t padding;
|
||||
|
||||
/* Link to the next free entry */
|
||||
uint16_t next;
|
||||
|
||||
/* Link to the last entry of descriptor list.
|
||||
* Only available for head-descriptor. */
|
||||
uint16_t last;
|
||||
|
||||
/* The length of descriptor list.
|
||||
* Only available for head-descriptor. */
|
||||
uint16_t num;
|
||||
|
||||
/* Used to preserve the order of fetching available descriptors.
|
||||
* Only available for head-descriptor. */
|
||||
uint64_t counter;
|
||||
|
||||
/* The buffer id */
|
||||
uint16_t id;
|
||||
|
||||
/* The descriptor flags */
|
||||
uint16_t flags;
|
||||
|
||||
/* The buffer length */
|
||||
uint32_t len;
|
||||
|
||||
/* The buffer address */
|
||||
uint64_t addr;
|
||||
} DescStatePacked;
|
||||
|
||||
typedef struct QueueRegionPacked {
|
||||
/* The feature flags of this region. Now it's initialized to 0. */
|
||||
uint64_t features;
|
||||
|
||||
/* The version of this region. It's 1 currently.
|
||||
* Zero value indicates an uninitialized buffer */
|
||||
uint16_t version;
|
||||
|
||||
/* The size of DescStatePacked array. It's equal to the virtqueue
|
||||
* size. Slave could get it from queue size field of VhostUserInflight. */
|
||||
uint16_t desc_num;
|
||||
|
||||
/* The head of free DescStatePacked entry list */
|
||||
uint16_t free_head;
|
||||
|
||||
/* The old head of free DescStatePacked entry list */
|
||||
uint16_t old_free_head;
|
||||
|
||||
/* The used index of descriptor ring */
|
||||
uint16_t used_idx;
|
||||
|
||||
/* The old used index of descriptor ring */
|
||||
uint16_t old_used_idx;
|
||||
|
||||
/* Device ring wrap counter */
|
||||
uint8_t used_wrap_counter;
|
||||
|
||||
/* The old device ring wrap counter */
|
||||
uint8_t old_used_wrap_counter;
|
||||
|
||||
/* Padding */
|
||||
uint8_t padding[7];
|
||||
|
||||
/* Used to track the state of each descriptor fetched from descriptor ring */
|
||||
DescStatePacked desc[0];
|
||||
} QueueRegionPacked;
|
||||
|
||||
To track inflight I/O, the queue region should be processed as follows:
|
||||
|
||||
When receiving available buffers from the driver:
|
||||
|
||||
1. Get the next available descriptor entry from descriptor ring, d
|
||||
|
||||
2. If d is head descriptor,
|
||||
|
||||
(a) Set desc[old_free_head].num to 0
|
||||
|
||||
(b) Set desc[old_free_head].counter to the value of global counter
|
||||
|
||||
(c) Increase global counter by 1
|
||||
|
||||
(d) Set desc[old_free_head].inflight to 1
|
||||
|
||||
3. If d is last descriptor, set desc[old_free_head].last to free_head
|
||||
|
||||
4. Increase desc[old_free_head].num by 1
|
||||
|
||||
5. Set desc[free_head].addr, desc[free_head].len, desc[free_head].flags,
|
||||
desc[free_head].id to d.addr, d.len, d.flags, d.id
|
||||
|
||||
6. Set free_head to desc[free_head].next
|
||||
|
||||
7. If d is last descriptor, set old_free_head to free_head
|
||||
|
||||
When supplying used buffers to the driver:
|
||||
|
||||
1. Get corresponding used head-descriptor entry from descriptor ring, d
|
||||
|
||||
2. Get corresponding DescStatePacked entry, e
|
||||
|
||||
3. Set desc[e.last].next to free_head
|
||||
|
||||
4. Set free_head to the index of e
|
||||
|
||||
5. Steps 1,2,3,4 may be performed repeatedly if batching is possible
|
||||
|
||||
6. Increase used_idx by the size of the batch and update used_wrap_counter if needed
|
||||
|
||||
7. Update d.flags
|
||||
|
||||
8. Set the inflight field of each head DescStatePacked entry in the batch to 0
|
||||
|
||||
9. Set old_free_head, old_used_idx, old_used_wrap_counter to free_head, used_idx,
|
||||
used_wrap_counter
|
||||
|
||||
When reconnecting:
|
||||
|
||||
1. If used_idx does not match old_used_idx (means the inflight field of DescStatePacked
|
||||
entries in last batch may be incorrect),
|
||||
|
||||
(a) Get the next descriptor ring entry through old_used_idx, d
|
||||
|
||||
(b) Use old_used_wrap_counter to calculate the available flags
|
||||
|
||||
(c) If d.flags is not equal to the calculated flags value (means slave has
|
||||
submitted the buffer to guest driver before crash, so it has to commit the
|
||||
in-progres update), set old_free_head, old_used_idx, old_used_wrap_counter
|
||||
to free_head, used_idx, used_wrap_counter
|
||||
|
||||
2. Set free_head, used_idx, used_wrap_counter to old_free_head, old_used_idx,
|
||||
old_used_wrap_counter (roll back any in-progress update)
|
||||
|
||||
3. Set the inflight field of each DescStatePacked entry in free list to 0
|
||||
|
||||
4. Resubmit inflight DescStatePacked entries in order of their counter value
|
||||
|
||||
Protocol features
|
||||
-----------------
|
||||
|
||||
|
@ -397,6 +666,7 @@ Protocol features
|
|||
#define VHOST_USER_PROTOCOL_F_CONFIG 9
|
||||
#define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10
|
||||
#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
|
||||
#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
|
||||
|
||||
Master message types
|
||||
--------------------
|
||||
|
@ -761,6 +1031,26 @@ Master message types
|
|||
was previously sent.
|
||||
The value returned is an error indication; 0 is success.
|
||||
|
||||
* VHOST_USER_GET_INFLIGHT_FD
|
||||
Id: 31
|
||||
Equivalent ioctl: N/A
|
||||
Master payload: inflight description
|
||||
|
||||
When VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD protocol feature has been
|
||||
successfully negotiated, this message is submitted by master to get
|
||||
a shared buffer from slave. The shared buffer will be used to track
|
||||
inflight I/O by slave. QEMU should retrieve a new one when vm reset.
|
||||
|
||||
* VHOST_USER_SET_INFLIGHT_FD
|
||||
Id: 32
|
||||
Equivalent ioctl: N/A
|
||||
Master payload: inflight description
|
||||
|
||||
When VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD protocol feature has been
|
||||
successfully negotiated, this message is submitted by master to send
|
||||
the shared inflight buffer back to slave so that slave could get
|
||||
inflight I/O after a crash or restart.
|
||||
|
||||
Slave message types
|
||||
-------------------
|
||||
|
||||
|
@ -835,3 +1125,95 @@ resilient for selective requests.
|
|||
For the message types that already solicit a reply from the client, the
|
||||
presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
|
||||
no behavioural change. (See the 'Communication' section for details.)
|
||||
|
||||
Backend program conventions
|
||||
---------------------------
|
||||
|
||||
vhost-user backends can provide various devices & services and may
|
||||
need to be configured manually depending on the use case. However, it
|
||||
is a good idea to follow the conventions listed here when
|
||||
possible. Users, QEMU or libvirt, can then rely on some common
|
||||
behaviour to avoid heterogenous configuration and management of the
|
||||
backend programs and facilitate interoperability.
|
||||
|
||||
Each backend installed on a host system should come with at least one
|
||||
JSON file that conforms to the vhost-user.json schema. Each file
|
||||
informs the management applications about the backend type, and binary
|
||||
location. In addition, it defines rules for management apps for
|
||||
picking the highest priority backend when multiple match the search
|
||||
criteria (see @VhostUserBackend documentation in the schema file).
|
||||
|
||||
If the backend is not capable of enabling a requested feature on the
|
||||
host (such as 3D acceleration with virgl), or the initialization
|
||||
failed, the backend should fail to start early and exit with a status
|
||||
!= 0. It may also print a message to stderr for further details.
|
||||
|
||||
The backend program must not daemonize itself, but it may be
|
||||
daemonized by the management layer. It may also have a restricted
|
||||
access to the system.
|
||||
|
||||
File descriptors 0, 1 and 2 will exist, and have regular
|
||||
stdin/stdout/stderr usage (they may have been redirected to /dev/null
|
||||
by the management layer, or to a log handler).
|
||||
|
||||
The backend program must end (as quickly and cleanly as possible) when
|
||||
the SIGTERM signal is received. Eventually, it may receive SIGKILL by
|
||||
the management layer after a few seconds.
|
||||
|
||||
The following command line options have an expected behaviour. They
|
||||
are mandatory, unless explicitly said differently:
|
||||
|
||||
* --socket-path=PATH
|
||||
|
||||
This option specify the location of the vhost-user Unix domain socket.
|
||||
It is incompatible with --fd.
|
||||
|
||||
* --fd=FDNUM
|
||||
|
||||
When this argument is given, the backend program is started with the
|
||||
vhost-user socket as file descriptor FDNUM. It is incompatible with
|
||||
--socket-path.
|
||||
|
||||
* --print-capabilities
|
||||
|
||||
Output to stdout the backend capabilities in JSON format, and then
|
||||
exit successfully. Other options and arguments should be ignored, and
|
||||
the backend program should not perform its normal function. The
|
||||
capabilities can be reported dynamically depending on the host
|
||||
capabilities.
|
||||
|
||||
The JSON output is described in the vhost-user.json schema, by
|
||||
@VHostUserBackendCapabilities. Example:
|
||||
{
|
||||
"type": "foo",
|
||||
"features": [
|
||||
"feature-a",
|
||||
"feature-b"
|
||||
]
|
||||
}
|
||||
|
||||
vhost-user-input
|
||||
----------------
|
||||
|
||||
Command line options:
|
||||
|
||||
* --evdev-path=PATH (optional)
|
||||
|
||||
Specify the linux input device.
|
||||
|
||||
* --no-grab (optional)
|
||||
|
||||
Do no request exclusive access to the input device.
|
||||
|
||||
vhost-user-gpu
|
||||
--------------
|
||||
|
||||
Command line options:
|
||||
|
||||
* --render-node=PATH (optional)
|
||||
|
||||
Specify the GPU DRM render node.
|
||||
|
||||
* --virgl (optional)
|
||||
|
||||
Enable virgl rendering support.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue