Move virtio_reset() to a later part of the file to remove the forward
declaration of virtio_set_features_nocheck() and to prepare the
situation that we need more operations to perform during reset.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250421-reset-v2-2-e4c1ead88ea1@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio-net expects set_features() will be called when the feature set
used by the guest changes to update the number of virtqueues but it is
not called during reset, which will clear all features, leaving the
queues added for VIRTIO_NET_F_MQ or VIRTIO_NET_F_RSS. Not only these
extra queues are visible to the guest, they will cause segmentation
fault during migration.
Call set_features() during reset to remove those queues for virtio-net
as we call set_status(). It will also prevent similar bugs for
virtio-net and other devices in the future.
Fixes: f9d6dbf0bf ("virtio-net: remove virtio queues if the guest doesn't support multiqueue")
Buglink: https://issues.redhat.com/browse/RHEL-73842
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250421-reset-v2-1-e4c1ead88ea1@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Live migration should be terminated if the vhost-user backend crashes
before the migration completes.
Specifically, since the vhost device will be stopped when VM is stopped
before the end of the live migration, in current implementation if the
backend crashes, vhost-user device set_status() won't return failure,
live migration won't perceive the disconnection between QEMU and the
backend.
When the VM is migrated to the destination, the inflight IO will be
resubmitted, and if the IO was completed out of order before, it will
cause IO error.
To fix this issue:
1. Add the return value to set_status() for VirtioDeviceClass.
a. For the vhost-user device, return failure when the backend crashes.
b. For other virtio devices, always return 0.
2. Return failure if vhost_dev_stop() failed for vhost-user device.
If QEMU loses connection with the vhost-user backend, virtio set_status()
can return failure to the upper layer, migration_completion() can handle
the error, terminate the live migration, and restore the VM, so that
inflight IO can be completed normally.
Signed-off-by: Haoqian He <haoqian.he@smartx.com>
Message-Id: <20250416024729.3289157-4-haoqian.he@smartx.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch captures the error of vhost_virtqueue_stop() in vhost_dev_stop()
and returns the error upward.
Specifically, if QEMU is disconnected from the vhost backend, some actions
in vhost_dev_stop() will fail, such as sending vhost-user messages to the
backend (GET_VRING_BASE, SET_VRING_ENABLE) and vhost_reset_status.
Considering that both set_vring_enable and vhost_reset_status require setting
the specific virtio feature bit, we can capture vhost_virtqueue_stop()'s
error to indicate that QEMU has lost connection with the backend.
This patch is the pre patch for 'vhost-user: return failure if backend crashes
when live migration', which makes the live migration aware of the loss of
connection with the vhost-user backend and aborts the live migration.
Signed-off-by: Haoqian He <haoqian.he@smartx.com>
Message-Id: <20250416024729.3289157-3-haoqian.he@smartx.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch adds the new VM state change cb type `VMChangeStateHandlerWithRet`,
which has return value for `VMChangeStateEntry`.
Thus, we can register a new VM state change cb with return value for device.
Note that `VMChangeStateHandler` and `VMChangeStateHandlerWithRet` are mutually
exclusive and cannot be provided at the same time.
This patch is the pre patch for 'vhost-user: return failure if backend crashes
when live migration', which makes the live migration aware of the loss of
connection with the vhost-user backend and aborts the live migration.
Virtio device will use VMChangeStateHandlerWithRet.
Signed-off-by: Haoqian He <haoqian.he@smartx.com>
Message-Id: <20250416024729.3289157-2-haoqian.he@smartx.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A virtio-net device can be added as a SR-IOV VF to another virtio-pci
device that will be the PF.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250314-sriov-v9-7-57dae8ae3ab5@daynix.com>
Tested-by: Yui Washizu <yui.washidu@gmail.com>
Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Allow user to attach SR-IOV VF to a virtio-pci PF.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250314-sriov-v9-6-57dae8ae3ab5@daynix.com>
Tested-by: Yui Washizu <yui.washidu@gmail.com>
Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In commit 98ed8ecfc9 ("exec: introduce target_words_bigendian()
helper") target_words_bigendian() was matching the definition it
was depending on (TARGET_WORDS_BIGENDIAN). Later in commit
ee3eb3a7ce ("Replace TARGET_WORDS_BIGENDIAN") the definition was
renamed as TARGET_BIG_ENDIAN but we didn't update the helper.
Do it now mechanically using:
$ sed -i -e s/target_words_bigendian/target_big_endian/g \
$(git grep -wl target_words_bigendian)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20250417210025.68322-1-philmd@linaro.org>
All callers now correctly expect a const class data.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250424194905.82506-5-philmd@linaro.org>
Mechanical change using gsed, then style manually adapted
to pass checkpatch.pl script.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250424194905.82506-4-philmd@linaro.org>
A few functions now end with a label. The next commit will clean them
up.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250407082643.2310002-3-armbru@redhat.com>
[Straightforward conflict with commit 988ad4cceb (hw/loongarch/virt:
Fix cpuslot::cpu set at last in virt_cpu_plug()) resolved]
Convert the existing includes with sed.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Convert the existing includes with sed.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Convert the existing includes with sed.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Convert the existing includes with
sed -i ,exec/memory.h,system/memory.h,g
Move the include within cpu-all.h into a !CONFIG_USER_ONLY block.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
For the s390x target, it's possible to build the QEMU binary without
CONFIG_VIRTIO_PCI and only have the virtio-mem device via the ccw
transport. In that case, QEMU currently fails to link correctly:
/usr/bin/ld: libqemu-s390x-softmmu.a.p/hw_s390x_s390-virtio-ccw.c.o: in function `s390_machine_device_pre_plug':
../hw/s390x/s390-virtio-ccw.c:579:(.text+0x1e96): undefined reference to `virtio_md_pci_pre_plug'
/usr/bin/ld: libqemu-s390x-softmmu.a.p/hw_s390x_s390-virtio-ccw.c.o: in function `s390_machine_device_plug':
../hw/s390x/s390-virtio-ccw.c:608:(.text+0x21a4): undefined reference to `virtio_md_pci_plug'
/usr/bin/ld: libqemu-s390x-softmmu.a.p/hw_s390x_s390-virtio-ccw.c.o: in function `s390_machine_device_unplug_request':
../hw/s390x/s390-virtio-ccw.c:622:(.text+0x2334): undefined reference to `virtio_md_pci_unplug_request'
/usr/bin/ld: libqemu-s390x-softmmu.a.p/hw_s390x_s390-virtio-ccw.c.o: in function `s390_machine_device_unplug':
../hw/s390x/s390-virtio-ccw.c:633:(.text+0x2436): undefined reference to `virtio_md_pci_unplug'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
We also need to include the stubs when CONFIG_VIRTIO_PCI is missing.
Fixes: aa910c20ec ("s390x: virtio-mem support")
Message-ID: <20250313063522.1348288-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The code that builds an array of AioContext pointers indexed by the
virtqueue is not specific to virtio-blk. virtio-scsi will need to do the
same thing, so extract the functions.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-11-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Rather than checking ACPI availability at compile time by
checking the CONFIG_ACPI definition from CONFIG_DEVICES,
check at runtime via acpi_builtin().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20250307223949.54040-5-philmd@linaro.org>
The pm_cap on the PCIExpressDevice object can be distilled down
to the new instance on the PCIDevice object.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250225215237.3314011-5-alex.williamson@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Switch callers directly initializing the PCI PM capability with
pci_add_capability() to use pci_pm_init().
Cc: Dmitry Fleytman <dmitry.fleytman@gmail.com>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Stefan Weil <sw@weilnetz.de>
Cc: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Klaus Jensen <its@irrelevant.dk>
Cc: Jesper Devantier <foss@defmacro.it>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250225215237.3314011-3-alex.williamson@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Features:
SR-IOV emulation for pci
virtio-mem-pci support for s390
interleave support for cxl
big endian support for vdpa svq
new QAPI events for vhost-user
Also vIOMMU reset order fixups are in.
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAme4b8sPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpHKcIAKPJsVqPdda2dJ7b7FdyRT0Q+uwezXqaGHd4
7Lzih1wsxYNkwIAyPtEb76/21qiS7BluqlUCfCB66R9xWjP5/KfvAFj4/r4AEduE
fxAgYzotNpv55zcRbcflMyvQ42WGiZZHC+o5Lp7vDXUP3pIyHrl0Ydh5WmcD+hwS
BjXvda58TirQpPJ7rUL+sSfLih17zQkkDcfv5/AgorDy1wK09RBKwMx/gq7wG8yJ
twy8eBY2CmfmFD7eTM+EKqBD2T0kwLEeLfS/F/tl5Fyg6lAiYgYtCbGLpAmWErsg
XZvfZmwqL7CNzWexGvPFnnLyqwC33WUP0k0kT88Y5wh3/h98blw=
=tej8
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pc,pci: features, fixes, cleanups
Features:
SR-IOV emulation for pci
virtio-mem-pci support for s390
interleave support for cxl
big endian support for vdpa svq
new QAPI events for vhost-user
Also vIOMMU reset order fixups are in.
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAme4b8sPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpHKcIAKPJsVqPdda2dJ7b7FdyRT0Q+uwezXqaGHd4
# 7Lzih1wsxYNkwIAyPtEb76/21qiS7BluqlUCfCB66R9xWjP5/KfvAFj4/r4AEduE
# fxAgYzotNpv55zcRbcflMyvQ42WGiZZHC+o5Lp7vDXUP3pIyHrl0Ydh5WmcD+hwS
# BjXvda58TirQpPJ7rUL+sSfLih17zQkkDcfv5/AgorDy1wK09RBKwMx/gq7wG8yJ
# twy8eBY2CmfmFD7eTM+EKqBD2T0kwLEeLfS/F/tl5Fyg6lAiYgYtCbGLpAmWErsg
# XZvfZmwqL7CNzWexGvPFnnLyqwC33WUP0k0kT88Y5wh3/h98blw=
# =tej8
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 21 Feb 2025 20:21:31 HKT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (41 commits)
docs/devel/reset: Document reset expectations for DMA and IOMMU
hw/vfio/common: Add a trace point in vfio_reset_handler
hw/arm/smmuv3: Move reset to exit phase
hw/i386/intel-iommu: Migrate to 3-phase reset
hw/virtio/virtio-iommu: Migrate to 3-phase reset
vhost-user-snd: correct the calculation of config_size
net: vhost-user: add QAPI events to report connection state
hw/virtio/virtio-nsm: Respond with correct length
vdpa: Fix endian bugs in shadow virtqueue
MAINTAINERS: add more files to `vhost`
cryptodev/vhost: allocate CryptoDevBackendVhost using g_mem0()
vhost-iova-tree: Update documentation
vhost-iova-tree, svq: Implement GPA->IOVA & partial IOVA->HVA trees
vhost-iova-tree: Implement an IOVA-only tree
amd_iommu: Use correct bitmask to set capability BAR
amd_iommu: Use correct DTE field for interrupt passthrough
hw/virtio: reset virtio balloon stats on machine reset
mem/cxl_type3: support 3, 6, 12 and 16 interleave ways
hw/mem/cxl_type3: Ensure errp is set on realization failure
hw/mem/cxl_type3: Fix special_ops memory leak on msix_init_exclusive_bar() failure
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Currently the iommu may be reset before the devices
it protects. For example this happens with virtio-net.
Let's use 3-phase reset mechanism and reset the IOMMU on
exit phase after all DMA capable devices have been
reset during the 'enter' or 'hold' phase.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20250218182737.76722-2-eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Use virtio_get_config_size() rather than sizeof(struct
virtio_snd_config) for the config_size in the vhost-user-snd frontend.
The frontend shall rely on device features for the size of the device
configuration space. The presence of `controls` in the config space
depends on VIRTIO_SND_F_CTLS according to the specification (v1.3):
`
5.14.4 Device Configuration Layout
...
controls
(driver-read-only) indicates a total number of all available control
elements if VIRTIO_SND_F_CTLS has been negotiated.
`
This fixes an issue introduced by commit ab0c7fb2 ("linux-headers:
update to current kvm/next") in which the optional field `controls` is
added to the virtio_snd_config structure. This breaks vhost-user-device
backends that do not implement the `controls` field.
Fixes: ab0c7fb22b ("linux-headers: update to current kvm/next")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2805
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Matias Ezequiel Vara Larsen <mvaralar@redhat.com>
Message-Id: <20250217131255.829892-1-mvaralar@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Dorinda Bassey <dbassey@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When we return a response packet from NSM, we need to indicate its
length according to the content of the response. Prior to this patch, we
returned the length of the source buffer, which may confuse guest code
that relies on the response size.
Fix it by returning the response payload size instead.
Fixes: bb154e3e0c ("device/virtio-nsm: Support for Nitro Secure Module device")
Reported-by: Vikrant Garg <vikrant1garg@gmail.com>
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20250213114541.67515-1-graf@amazon.com>
Reviewed-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Fixes: bb154e3e0c ("device/virtio-nsm: Support for Nitro Secure Module device")<br>
Reported-by: Vikrant Garg <vikrant1garg@gmail.com>
Signed-off-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Vikrant Garg <vikrant1garg@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
VDPA didn't work on a big-endian machine due to missing/incorrect
CPU<->LE data format conversions.
Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com>
Message-Id: <20250212164923.1971538-1-kshk@linux.ibm.com>
Fixes: 10857ec0ad ("vhost: Add VhostShadowVirtqueue")
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Minor update to some of the documentation / comments in
hw/virtio/vhost-iova-tree.c.
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20250217144936.3589907-4-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Creates and supports a GPA->IOVA tree and a partial IOVA->HVA tree by
splitting up guest-backed memory maps and host-only memory maps from the
full IOVA->HVA tree. That is, any guest-backed memory maps are now
stored in the GPA->IOVA tree and host-only memory maps stay in the
IOVA->HVA tree.
Also propagates the GPAs (in_addr/out_addr) of a VirtQueueElement to
vhost_svq_translate_addr() to translate GPAs to IOVAs via the GPA->IOVA
tree (when descriptors are backed by guest memory). For descriptors
backed by host-only memory, the existing partial SVQ IOVA->HVA tree is
used.
GPAs are unique in the guest's address space, ensuring unambiguous IOVA
translations. This avoids the issue where different GPAs map to the same
HVA, causing the original HVA->IOVA translation to potentially return an
IOVA associated with the wrong intended GPA.
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20250217144936.3589907-3-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Creates and supports an IOVA-only tree to support a SVQ IOVA->HVA and
GPA->IOVA tree for host-only and guest-backed memory, respectively, in
the next patch.
The IOVA allocator still allocates an IOVA range but now adds this range
to the IOVA-only tree as well as to the full IOVA->HVA tree.
In the next patch, the full IOVA->HVA tree will be split into a partial
SVQ IOVA->HVA tree and a GPA->IOVA tree. The motivation behind having an
IOVA-only tree was to have a single tree that would keep track of all
allocated IOVA ranges between the partial SVQ IOVA->HVA and GPA->IOVA
trees.
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20250217144936.3589907-2-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When a machine is first booted, all virtio balloon stats are initialized
to their default value -1 (18446744073709551615 when represented as
unsigned).
They remain that way while the firmware is loading, and early phase of
guest OS boot, until the virtio-balloon driver is activated. Thereafter
the reported stats reflect the guest OS activity.
When a machine reset is performed, however, the virtio-balloon stats are
left unchanged by QEMU, despite the guest OS no longer updating them,
nor indeed even still existing.
IOW, the mgmt app keeps getting stale stats until the guest OS starts
once more and loads the virtio-balloon driver (if ever). At that point
the app will see a discontinuity in the reported values as they sudden
jump from the stale value to the new value. This jump is indigituishable
from a valid data update.
While there is an "last-updated" field to report on the freshness of
the stats, that does not unambiguously tell the mgmt app whether the
stats are still conceptually relevant to the current running workload.
It is more conceptually useful to reset the stats to their default
values on machine reset, given that the previous guest workload the
stats reflect no longer exists. The mgmt app can now clearly identify
that there are is no stats information available from the current
executing workload.
The 'last-updated' time is also reset back to 0.
IOW, on every machine reset, the virtio stats are in the same clean
state they were when the macine first powered on.
A functional test is added to validate this behaviour with a real
world guest OS.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20250204094202.2183262-1-berrange@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Replace the boolean value enable_mlock with an enum and add a helper to
decide whether we should be calling os_mlock.
This is a stepping stone towards introducing a new mlock mode, which
will be the third possible state of this enum.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Link: https://lore.kernel.org/r/20250212143920.1269754-4-d-tatianin@yandex-team.ru
Signed-off-by: Peter Xu <peterx@redhat.com>
The general expectation is that header files should follow the same
file/path naming scheme as the corresponding source file. There are
various historical exceptions to this practice in QEMU, with one of
the most notable being the include/qapi/qmp/ directory. Most of the
headers there correspond to source files in qobject/.
This patch corrects most of that inconsistency by creating
include/qobject/ and moving the headers for qobject/ there.
This also fixes MAINTAINERS for include/qapi/qmp/dispatch.h:
scripts/get_maintainer.pl now reports "QAPI" instead of "No
maintainers found".
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com> #s390x
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20241118151235.2665921-2-armbru@redhat.com>
[Rebased]
Let's do it similar as virtio-balloon-pci. With this change, we can
use virtio-mem-pci on s390x, although plugging will still fail until
properly wired up in the machine.
No need to worry about transitional/non_transitional devices, because they
don't exist for virtio-mem.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250128185705.1609038-2-david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Most virtio-pci devices allow MSI-X. Add it to virtio-balloon-pci, but
only enable it in new machine types, so we don't break migration of
existing machine types between different qemu versions.
This copies what was done for virtio-rng-pci in:
9ea02e8f13 ("virtio-rng-pci: Allow setting nvectors, so we can use MSI-X")
bad9c5a516 ("virtio-rng-pci: fix migration compat for vectors")
62bdb88715 ("virtio-rng-pci: fix transitional migration compat for vectors")
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Message-ID: <20250115161425.246348-1-arbab@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
"Host Memory Backends" and "Memory devices" queue ("mem"):
- Fixup handling of virtio-mem unplug during system resets, as
preparation for s390x support (especially kdump in the Linux guest)
- virtio-mem support for s390x
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmdnFD4RHGRhdmlkQHJl
ZGhhdC5jb20ACgkQTd4Q9wD/g1rWBBAAp7WkYaNAjRy1PgpjNZ3z1gUJc/vk+skJ
xVgGodA8txrJOFpNrbTyfhrdLs2TV4oWDvB/zrZRRtuxvur3O1EhFd9k6EqXuydr
0FunvLvVJwRHfEZycjN4aacQMRH3CJw07OaTzexeSl5UR/6w5PRofwUK4HX7W/Ka
arqomGa3OJrs1+WgkV0Qcn4vh9HLRVv3iNC2Xo4W1wOCr1Du9zSPn9oC7zOQ0EO4
ZC//7QsdkNRjUX/yMXMkhlSXx3b/RmRg2DBrxo7BZXg27VwGu4uHxL4LRBZiB2A7
V9MqFOcVKzPMkXKTRjrgZ0vXQx1MPJ6WprEihMzMpYU6DrpA7KN/l8Ca8H24B2ln
h7+bmkDsHVVcWovE9ii/9cMRfws6uWXXg3KoA8RQ8IbX1tU02lblw2uHhXEzcoge
npqp/Z5LAiKVMetEnNnLH5thjut5PAEjuqD00cmZAMy4DNngLX2bGSdzMeVBkDMa
78ehLGRplm3t7ibUfaZaMKe6UD9tFrcD6XKsvUTXXHNbYO8ynbx58WOxSZmY98zU
n3JNQRqtXYjBVlH3Dqm47vOTZHgOzFv3raa8BmSLpcBDeTXCTcUIl20s77dGw/vT
r5YNCMN7O4YPFKUoRK9604QTgw6qlYaRTQlJD09usprGqVylb6gQtfZZuZkYDMp8
sEI77QHsePA=
=HDxr
-----END PGP SIGNATURE-----
Merge tag 'mem-2024-12-21' of https://github.com/davidhildenbrand/qemu into staging
Hi,
"Host Memory Backends" and "Memory devices" queue ("mem"):
- Fixup handling of virtio-mem unplug during system resets, as
preparation for s390x support (especially kdump in the Linux guest)
- virtio-mem support for s390x
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmdnFD4RHGRhdmlkQHJl
# ZGhhdC5jb20ACgkQTd4Q9wD/g1rWBBAAp7WkYaNAjRy1PgpjNZ3z1gUJc/vk+skJ
# xVgGodA8txrJOFpNrbTyfhrdLs2TV4oWDvB/zrZRRtuxvur3O1EhFd9k6EqXuydr
# 0FunvLvVJwRHfEZycjN4aacQMRH3CJw07OaTzexeSl5UR/6w5PRofwUK4HX7W/Ka
# arqomGa3OJrs1+WgkV0Qcn4vh9HLRVv3iNC2Xo4W1wOCr1Du9zSPn9oC7zOQ0EO4
# ZC//7QsdkNRjUX/yMXMkhlSXx3b/RmRg2DBrxo7BZXg27VwGu4uHxL4LRBZiB2A7
# V9MqFOcVKzPMkXKTRjrgZ0vXQx1MPJ6WprEihMzMpYU6DrpA7KN/l8Ca8H24B2ln
# h7+bmkDsHVVcWovE9ii/9cMRfws6uWXXg3KoA8RQ8IbX1tU02lblw2uHhXEzcoge
# npqp/Z5LAiKVMetEnNnLH5thjut5PAEjuqD00cmZAMy4DNngLX2bGSdzMeVBkDMa
# 78ehLGRplm3t7ibUfaZaMKe6UD9tFrcD6XKsvUTXXHNbYO8ynbx58WOxSZmY98zU
# n3JNQRqtXYjBVlH3Dqm47vOTZHgOzFv3raa8BmSLpcBDeTXCTcUIl20s77dGw/vT
# r5YNCMN7O4YPFKUoRK9604QTgw6qlYaRTQlJD09usprGqVylb6gQtfZZuZkYDMp8
# sEI77QHsePA=
# =HDxr
# -----END PGP SIGNATURE-----
# gpg: Signature made Sat 21 Dec 2024 14:17:18 EST
# gpg: using RSA key 1BD9CAAD735C4C3A460DFCCA4DDE10F700FF835A
# gpg: issuer "david@redhat.com"
# gpg: Good signature from "David Hildenbrand <david@redhat.com>" [unknown]
# gpg: aka "David Hildenbrand <davidhildenbrand@gmail.com>" [full]
# gpg: aka "David Hildenbrand <hildenbr@in.tum.de>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 1BD9 CAAD 735C 4C3A 460D FCCA 4DDE 10F7 00FF 835A
* tag 'mem-2024-12-21' of https://github.com/davidhildenbrand/qemu:
s390x: virtio-mem support
s390x/virtio-ccw: add support for virtio based memory devices
s390x: remember the maximum page size
s390x/pv: prepare for memory devices
s390x/s390-virtio-ccw: prepare for memory devices
s390x/s390-skeys: prepare for memory devices
s390x/s390-stattrib-kvm: prepare for memory devices and sparse memory layouts
s390x/s390-hypercall: introduce DIAG500 STORAGE_LIMIT
s390x: introduce s390_get_memory_limit()
s390x/s390-virtio-ccw: move setting the maximum guest size from sclp to machine code
s390x: rename s390-virtio-hcall* to s390-hypercall*
s390x/s390-virtio-hcall: prepare for more diag500 hypercalls
s390x/s390-virtio-hcall: remove hypercall registration mechanism
s390x/s390-virtio-ccw: don't crash on weird RAM sizes
virtio-mem: unplug memory only during system resets, not device resets
Conflicts:
- hw/s390x/s390-stattrib-kvm.c
sysemu/ -> system/ header rename conflict.
- hw/s390x/virtio-ccw-mem.c
Make Property array const and removed DEFINE_PROP_END_OF_LIST() to
conform to the latest conventions.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Let's add our virtio-mem-ccw proxy device and wire it up. We should
be supporting everything (e.g., device unplug, "dynamic-memslots") that
we already support for the virtio-pci variant.
With a Linux guest that supports virtio-mem (and has automatic memory
onlining properly configured) the following example will work:
1. Start a VM with 4G initial memory and a virtio-mem device with a maximum
capacity of 16GB:
qemu/build/qemu-system-s390x \
--enable-kvm \
-m 4G,maxmem=20G \
-nographic \
-smp 8 \
-hda Fedora-Server-KVM-40-1.14.s390x.qcow2 \
-chardev socket,id=monitor,path=/var/tmp/monitor,server,nowait \
-mon chardev=monitor,mode=readline \
-object memory-backend-ram,id=mem0,size=16G,reserve=off \
-device virtio-mem-ccw,id=vmem0,memdev=mem0,dynamic-memslots=on
2. Query the current size of virtio-mem device:
(qemu) info memory-devices
Memory device [virtio-mem]: "vmem0"
memaddr: 0x100000000
node: 0
requested-size: 0
size: 0
max-size: 17179869184
block-size: 1048576
memdev: /objects/mem0
3. Request to grow it to 8GB (hotplug 8GB):
(qemu) qom-set vmem0 requested-size 8G
(qemu) info memory-devices
Memory device [virtio-mem]: "vmem0"
memaddr: 0x100000000
node: 0
requested-size: 8589934592
size: 8589934592
max-size: 17179869184
block-size: 1048576
memdev: /objects/mem0
4. Request to grow to 16GB (hotplug another 8GB):
(qemu) qom-set vmem0 requested-size 16G
(qemu) info memory-devices
Memory device [virtio-mem]: "vmem0"
memaddr: 0x100000000
node: 0
requested-size: 17179869184
size: 17179869184
max-size: 17179869184
block-size: 1048576
memdev: /objects/mem0
5. Try to hotunplug all memory again, shrinking to 0GB:
(qemu) qom-set vmem0 requested-size 0G
(qemu) info memory-devices
Memory device [virtio-mem]: "vmem0"
memaddr: 0x100000000
node: 0
requested-size: 0
size: 0
max-size: 17179869184
block-size: 1048576
memdev: /objects/mem0
6. If it worked, unplug the device
(qemu) device_del vmem0
(qemu) info memory-devices
(qemu) object_del mem0
7. Hotplug a new device with a smaller capacity and directly size it to 1GB
(qemu) object_add memory-backend-ram,id=mem0,size=8G,reserve=off
(qemu) device_add virtio-mem-ccw,id=vmem0,memdev=mem0,\
dynamic-memslots=on,requested-size=1G
(qemu) info memory-devices
Memory device [virtio-mem]: "vmem0"
memaddr: 0x100000000
node: 0
requested-size: 1073741824
size: 1073741824
max-size: 8589934592
block-size: 1048576
memdev: /objects/mem0
Trying to use a virtio-mem device backed by hugetlb into a !hugetlb VM
correctly results in the error:
... Memory device uses a bigger page size than initial memory
Note that the virtio-mem driver in Linux will supports 1 MiB (pageblock)
granularity.
Message-ID: <20241219144115.2820241-15-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's implement support for abstract virtio based memory devices, using
the virtio-pci implementation as an orientation. Wire them up in the
machine hotplug handler, taking care of s390x page size limitations.
As we neither support virtio-mem or virtio-pmem yet, the code is
effectively unused. We'll implement support for virtio-mem based on this
next.
Note that we won't wire up the virtio-pci variant (should currently be
impossible due to lack of support for MSI-X), but we'll add a safety net
to reject plugging them in the pre-plug handler.
Message-ID: <20241219144115.2820241-14-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
We recently converted from the LegacyReset to the new reset framework
in commit c009a311e9 ("virtio-mem: Use new Resettable framework instead
of LegacyReset") to be able to use the ResetType to filter out wakeup
resets.
However, this change had an undesired implications: as we override the
Resettable interface methods in VirtIOMEMClass, the reset handler will
not only get called during system resets (i.e., qemu_devices_reset())
but also during any direct or indirect device rests (e.g.,
device_cold_reset()).
Further, we might now receive two reset callbacks during
qemu_devices_reset(), first when reset by a parent and later when reset
directly.
The memory state of virtio-mem devices is rather special: it's supposed to
be persistent/unchanged during most resets (similar to resetting a hard
disk will not destroy the data), unless actually cold-resetting the whole
system (different to a hard disk where a reboot will not destroy the data):
ripping out system RAM is something guest OSes don't particularly enjoy,
but we want to detect when rebooting to an OS that does not support
virtio-mem and wouldn't be able to detect+use the memory -- and we want
to force-defragment hotplugged memory to also shrink the usable device
memory region. So we rally want to catch system resets to do that.
On supported targets (e.g., x86), getting a cold reset on the
device/parent triggers is not that easy (but looks like PCI code
might trigger it), so this implication went unnoticed.
However, with upcoming s390x support it is problematic: during
kdump, s390x triggers a subsystem reset, ending up in
s390_machine_reset() and calling only subsystem_reset() instead of
qemu_devices_reset() -- because it's not a full system reset.
In subsystem_reset(), s390x performs a device_cold_reset() of any
TYPE_VIRTUAL_CSS_BRIDGE device, which ends up resetting all children,
including the virtio-mem device. Consequently, we wrongly detect a system
reset and unplug all device memory, resulting in hotplugged memory not
getting included in the crash dump -- undesired.
We really must not mess with hotplugged memory state during simple
device resets. To fix, create+register a new reset object that will only
get triggered during qemu_devices_reset() calls, but not during any other
resets as it is logically not the child of any other object.
Message-ID: <20241025104103.342188-1-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Juraj Marcin <jmarcin@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Headers in include/sysemu/ are not only related to system
*emulation*, they are also used by virtualization. Rename
as system/ which is clearer.
Files renamed manually then mechanical change using sed tool.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20241203172445.28576-1-philmd@linaro.org>
Now that all of the Property arrays are counted, we can remove
the terminator object from each array. Update the assertions
in device_class_set_props to match.
With struct Property being 88 bytes, this was a rather large
form of terminator. Saves 30k from qemu-system-aarch64.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://lore.kernel.org/r/20241218134251.4724-21-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
NSM device in AWS Nitro Enclaves supports extending with both
bytestring and string data.
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Message-ID: <20241109123208.24281-1-dorjoychy111@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
balloon_stats_get_all will iterate over guest stats upto the max
VIRTIO_BALLOON_S_NR value, calling visit_type_uint64 to populate
the QObject dict. The dict keys are obtained from the static
array balloon_stat_names which is VIRTIO_BALLOON_S_NR in size.
Unfortunately the way that array is declared results in any
unassigned stats getting a NULL name, which will then cause
visit_type_uint64 to trigger an assert in qobject_output_add_obj.
The balloon_stat_names array was fortunately fully populated with
names until recently:
commit 0d2eeef77a
Author: Bibo Mao <maobibo@loongson.cn>
Date: Mon Oct 28 10:38:09 2024 +0800
linux-headers: Update to Linux v6.12-rc5
pulled a change to include/standard-headers/linux/virtio_balloon.h
which increased VIRTIO_BALLOON_S_NR by 6, and failed to add the new
names to balloon_stat_names.
This commit fills in the missing names, and uses a static assert to
guarantee that any future changes to VIRTIO_BALLOON_S_NR will cause
a build failure until balloon_stat_names is updated.
This problem was detected by the Cockpit Project's automated
integration tests on QEMU 9.2.0-rc1.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2329448
Fixes: 0d2eeef77a ("linux-headers: Update to Linux v6.12-rc5")
Reported-by: Martin Pitt <mpitt@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20241129135507.699030-2-berrange@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>