Commit graph

119805 commits

Author SHA1 Message Date
Klaus Jensen
d96a32de3f hw/nvme: be compliant wrt. dsm processing limits
The specification states that,

> The controller shall set all three processing limit fields (i.e., the
> DMRL, DMRSL and DMSL fields) to non-zero values or shall clear all
> three processing limit fields to 0h.

So, set the DMRL and DMSL fields in addition to DMRSL.

Reviewed-by: Jesper Wendel Devantier <foss@defmacro.it>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2025-02-25 12:55:21 +01:00
Klaus Jensen
b202fb549d nvme: fix iocs status code values
The status codes related to I/O Command Sets are in the wrong group.

Reviewed-by: Jesper Wendel Devantier <foss@defmacro.it>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2025-02-25 12:55:21 +01:00
Klaus Jensen
9cf6ec0659 hw/nvme: add knob for doorbell buffer config support
Add a 'dbcs' knob to allow Doorbell Buffer Config command to be
disabled.

Reviewed-by: Jesper Wendel Devantier <foss@defmacro.it>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2025-02-25 12:55:21 +01:00
Klaus Jensen
e7047adf1e hw/nvme: make oacs dynamic
Virtualization Management needs sriov-related parameters. Only report
support for the command when that conditions are true.

Reviewed-by: Jesper Wendel Devantier <foss@defmacro.it>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2025-02-25 12:55:21 +01:00
Klaus Jensen
cd59f50ab0 hw/nvme: always initialize a subsystem
If no nvme-subsys is explicitly configured, instantiate one.

Reviewed-by: Jesper Wendel Devantier <foss@defmacro.it>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2025-02-25 12:55:21 +01:00
Stephen Bates
23a4b3ebc7 hw/nvme: Add OCP SMART / Health Information Extended Log Page
The Open Compute Project [1] includes a Datacenter NVMe
SSD Specification [2]. The most recent version of this specification
(as of November 2024) is 2.6.1. This specification layers on top of
the NVM Express specifications [3] to provide additional
functionality. A key part of of this is the 512 Byte OCP SMART / Health
Information Extended log page that is defined in Section 4.8.6 of the
specification.

We add a controller argument (ocp) that toggles on/off the SMART log
extended structure.  To accommodate different vendor specific specifications
like OCP, we add a multiplexing function (nvme_vendor_specific_log) which
will route to the different log functions based on arguments and log ids.
We only return the OCP extended SMART log when the command is 0xC0 and ocp
has been turned on in the nvme argumants.

Though we add the whole nvme SMART log extended structure, we only populate
the physical_media_units_{read,written}, log_page_version and
log_page_uuid.

This patch is based on work done by Joel but has been modified enough
that he requested a co-developed-by tag rather than a signed-off-by.

[1]: https://www.opencompute.org/
[2]: https://www.opencompute.org/documents/datacenter-nvme-ssd-specification-v2-6-1-pdf
[3]: https://nvmexpress.org/specifications/

Signed-off-by: Stephen Bates <sbates@raithlin.com>
Co-developed-by: Joel Granados <j.granados@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2025-02-25 12:27:21 +01:00
Paolo Bonzini
4cfe9edb1b rust: subprojects: add libc crate
This allows access to errno values.

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-25 10:49:23 +01:00
Zhao Liu
9ee4886700 i386: Fix the missing Rust HPET configuration option
The configuration option of Rust HPET is missing, so that PC machine
can't boot with "hpet=on" when QEMU Rust support is enabled.

Add the Rust HPET configuration option.

Fixes: d128c341a7 ("i386: enable rust hpet for pc when rust is enabled")
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250217154416.3144571-1-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-25 10:49:23 +01:00
Paolo Bonzini
1433e38cc8 hpet: do not overwrite properties on post_load
Migration relies on having the same device configuration on the source
and destination.  Therefore, there is no need to modify flags,
timer capabilities and the fw_cfg HPET block id on migration;
it was set to exactly the same values by realize.

Reviewed-by: Zhao Liu <zhao1.liu@intel.com> (hpet_post_load only)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-25 10:49:23 +01:00
Akihiko Odaki
36324c6774 qom: Use command line syntax for default values in help
object_property_help() uses the conventional command line syntax instead
of the JSON syntax. In particular,
- Key-value pairs are written in the command line syntax.
- bool description passed to the function says on/off instead of
  true/false.

However, there is one exception: default values are formatted into JSON.
While the command line and JSON syntaxes are consistent in many cases,
there are two types where they disagree:

string: The command line syntax omits quotes while JSON requires them.

bool: JSON only accepts true/false for bool but the command line syntax
      accepts on/off too, and on/off are also more popular than
      true/false. For example, the docs directory has 2045 "on"
      occurances while it has only 194 "true" occurances.
      on/off are also accepted by OnOffAuto so users do not have to
      remember the type is bool or OnOffAuto to use the values.

Omit quotes for strings and use on/off for bools when formatting
default values for better consistency.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Link: https://lore.kernel.org/r/20250207-bool-v1-1-5749d5d6df24@daynix.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-25 10:49:23 +01:00
Thomas Huth
d8b913f7c7 tests/qtest/qom-test: Test retrieval of machine class properties
There were recently some crashes that occurred when trying to
retrieve the properties of machines. Let's add a test to avoid
regression here.

Message-ID: <20250123204956.1561463-1-thuth@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2025-02-25 10:01:24 +01:00
Bibo Mao
db369c11c9 target/loongarch: Enable virtual extioi feature
Feature virtual extioi is loongArch virt machine property rather than
vCPU property in qemu side. However it is vCPU property in KVM kernel
side, here add loongArch virt machine property checking and enable virt
extioi feature when vCPU is created.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-02-25 16:05:31 +08:00
Bibo Mao
954cc5c311 target/loongarch: Add kvm steal time feature detection
Paravirt steal time feature is OnOffAuto type, feature detection is added
to check whether it is supported on KVM host.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-02-25 16:05:31 +08:00
Bibo Mao
610babce1e target/loongarch: Add vCPU property for kvm steal time feature
Property kvm-steal-time is added for kvm steal time feature, it is
specially for kvm mode.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-02-25 16:05:31 +08:00
Bibo Mao
2698cc7c99 target/loongarch: Enable paravirt ipi feature
The similiar with cpucfg register, paravirt ipi feature is set in
function kvm_arch_put_registers(). Instead the paravirt feature can
be enabled only once, it cannot be changed dynamically.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-02-25 16:05:31 +08:00
Bibo Mao
620d9bd002 target/loongarch: Add paravirt ipi feature detection
Paravirt ipi feature is OnOffAuto type, feature detection is added
to check whether it is supported by KVM host.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-02-25 16:05:31 +08:00
Bibo Mao
5b0502c564 target/loongarch: Add vCPU property for paravirt ipi feature
Property kvm-pv-ipi is added to paravirt ipi feature, it is specially
for kvm mode.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-02-25 16:05:31 +08:00
Bibo Mao
780a65bd95 target/loongarch: Move kvm specified vCPU property to kvm directory
LBT and PMU feature is supported only in kvm mode, move property
about these two features to function kvm_loongarch_cpu_post_init().

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-02-25 16:05:31 +08:00
Bibo Mao
3406b001e6 target/loongarch: Add post init function for kvm mode
Some features such as LBT and PMU are implemented in kvm mode,
With paravirt features in future, post init function is added
for kvm mode, so that property for these features will be created
in kvm post init function.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-02-25 16:05:31 +08:00
Bibo Mao
0262c8075e target/loongarch: Correct maximum physical address in KVM mode
On 3A5000 system, the physical address space width for host is 48,
however 47 bit for KVM VM. For KVM VM, size of physical address space is
the same with that of virtual user space address.

Here modify physical address space width with 47 bit in KVM mode.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-02-25 16:05:31 +08:00
Bibo Mao
7bd4eaa847 target/loongarch/gdbstub: Fix gdbstub incorrectly handling some registers
Write operation with R32 (orig_a0) and R34 (CSR_BADV) is discarded on
gdbstub implementation for LoongArch system. And return value should
be register size rather than 0, since it is used to calculate offset of
next register such as R33 (PC) in function handle_write_all_regs().

Cc: qemu-stable@nongnu.org
Fixes: ca61e75071 ("target/loongarch: Add gdb support.")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-02-25 16:05:31 +08:00
Xianglai Li
b61b9d8913 target/loongarch: fix vcpu reset command word issue
When the KVM_REG_LOONGARCH_VCPU_RESET command word
is sent to the kernel through the kvm_set_one_reg interface,
the parameter source needs to be a legal address,
otherwise the kernel will return an error and the command word
will fail to be sent.

Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
2025-02-25 16:04:22 +08:00
Peter Maydell
8fd2518ef2 hw: Centralize handling of -machine dumpdtb option
Currently we handle the 'dumpdtb' machine sub-option ad-hoc in every
board model that has an FDT.  It's up to the board code to make sure
it calls qemu_fdt_dumpdtb() in the right place.

This means we're inconsistent and often just ignore the user's
command line argument:
 * if the board doesn't have an FDT at all
 * if the board supports FDT, but there happens not to be one
   present (usually because of a missing -fdt option)

This isn't very helpful because it gives the user no clue why their
option was ignored.

However, in order to support the QMP/HMP dumpdtb commands we require
now that every FDT machine stores a pointer to the FDT in
MachineState::fdt.  This means we can handle -machine dumpdtb
centrally by calling the qmp_dumpdtb() function, unifying its
handling with the QMP/HMP commands.  All the board code calls to
qemu_fdt_dumpdtb() can then be removed.

For this commit we retain the existing behaviour that if there
is no FDT we silently ignore the -machine dumpdtb option.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2025-02-24 15:03:42 +00:00
Peter Maydell
db0dd33559 hw/mips/boston: Support dumpdtb monitor commands
The boston machine doesn't set MachineState::fdt to the DTB blob that
it has loaded or created, which means that the QMP/HMP dumpdtb
monitor commands don't work.

Setting MachineState::fdt is easy in the non-FIT codepath: we can
simply do so immediately before loading the DTB into guest memory.
The FIT codepath is a bit more awkward as currently the FIT loader
throws away the memory that the FDT was in after it loads it into
guest memory.  So we add a void *pfdt argument to load_fit() for it
to store the FDT pointer into.

There is some readjustment required of the pointer handling in
loader-fit.c, so that it applies 'const' only where it should (e.g.
the data pointer we get back from fdt_getprop() is const, because
it's into the middle of the input FDT data, but the pointer that
fit_load_image_alloc() should not be const, because it's freshly
allocated memory that the caller can change if it likes).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250206151214.2947842-5-peter.maydell@linaro.org
2025-02-24 15:03:42 +00:00
Peter Maydell
dfd0de7186 hw/mips/boston: Check for error return from boston_fdt_filter()
The function boston_fdt_filter() can return NULL on errors (in which
case it will print an error message).  When we call this from the
non-FIT-image codepath, we aren't checking the return value, so we
will plough on with a NULL pointer, and segfault in fdt_totalsize().
Check for errors here.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250206151214.2947842-4-peter.maydell@linaro.org
2025-02-24 15:03:42 +00:00
Peter Maydell
3c25f487bc hw/openrisc: Support monitor dumpdtb command
The openrisc machines don't set MachineState::fdt to point to their
DTB blob.  This means that although the command line '-machine
dumpdtb=file.dtb' option works, the equivalent QMP and HMP monitor
commands do not, but instead produce the error "This machine doesn't
have a FDT".

Set MachineState::fdt in openrisc_load_fdt(), when we write it to
guest memory.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250206151214.2947842-3-peter.maydell@linaro.org
2025-02-24 15:03:42 +00:00
Peter Maydell
cacf4cb251 monitor/hmp-cmds.c: Clean up hmp_dumpdtb printf
In hmp_dumpdtb(), we print a message when the command succeeds.  This
message is missing the trailing \n, so the HMP command prompt is
printed immediately after it.  We also weren't capitalizing 'DTB', or
quoting the filename in the message.  Fix these nits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250206151214.2947842-2-peter.maydell@linaro.org
2025-02-24 15:03:42 +00:00
Matthew R. Ochs
f10104aeae hw/arm/virt: Support larger highmem MMIO regions
The MMIO region size required to support virtualized environments with
large PCI BAR regions can exceed the hardcoded limit configured in QEMU.
For example, a VM with multiple NVIDIA Grace-Hopper GPUs passed through
requires more MMIO memory than the amount provided by VIRT_HIGH_PCIE_MMIO
(currently 512GB). Instead of updating VIRT_HIGH_PCIE_MMIO, introduce a
new parameter, highmem-mmio-size, that specifies the MMIO size required
to support the VM configuration.

Example usage with 1TB MMIO region size:
	-machine virt,gic-version=3,highmem-mmio-size=1T

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Message-id: 20250221145419.1281890-1-mochs@nvidia.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-02-24 15:03:42 +00:00
Nicolin Chen
ab544de127 hw/arm/smmuv3: Fill u.f_cd_fetch.addr for SMMU_EVT_F_CD_FETCH
When we fill in the SMMUEventInfo for SMMU_EVT_F_CD_FETCH we write
the address into the f_ste_fetch member of the union, but then when
we come to read it back in smmuv3_record_event() we will (correctly)
be using the f_cd_fetch member.

This is more like a cosmetics fix since the f_cd_fetch and f_ste_fetch are
basically the same field since they are in the exact same union with exact
same type, but it's conceptually wrong. Use the correct union member.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Message-id: 20250220213832.80289-1-nicolinc@nvidia.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-02-24 13:03:00 +00:00
Roman Penyaev
51beb350bb qapi/char.json: minor doc rewording for hub device
Refine documentation for the hub device, specify the maximum.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: qemu-devel@nongnu.org
Message-ID: <20250219090607.559887-1-r.peniaev@gmail.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2025-02-24 13:48:26 +01:00
Philippe Mathieu-Daudé
4098c6b7b6 tests/functional: Have microblaze tests inherit common parent class
Have the MicroblazeMachine class being common to both
MicroblazeBigEndianMachine and MicroblazeLittleEndianMachine
classes. Move the xmaton and ballerina tests to the parent class.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250212123659.52764-11-philmd@linaro.org>
[thuth: Add missing ASSET statements to the leaf classes]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2025-02-24 10:44:54 +01:00
Stefan Hajnoczi
b69801dd6b virtio,pc,pci: features, fixes, cleanups
Features:
 
 SR-IOV emulation for pci
 virtio-mem-pci support for s390
 interleave support for cxl
 big endian support for vdpa svq
 new QAPI events for vhost-user
 
 Also vIOMMU reset order fixups are in.
 Fixes, cleanups all over the place.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAme4b8sPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpHKcIAKPJsVqPdda2dJ7b7FdyRT0Q+uwezXqaGHd4
 7Lzih1wsxYNkwIAyPtEb76/21qiS7BluqlUCfCB66R9xWjP5/KfvAFj4/r4AEduE
 fxAgYzotNpv55zcRbcflMyvQ42WGiZZHC+o5Lp7vDXUP3pIyHrl0Ydh5WmcD+hwS
 BjXvda58TirQpPJ7rUL+sSfLih17zQkkDcfv5/AgorDy1wK09RBKwMx/gq7wG8yJ
 twy8eBY2CmfmFD7eTM+EKqBD2T0kwLEeLfS/F/tl5Fyg6lAiYgYtCbGLpAmWErsg
 XZvfZmwqL7CNzWexGvPFnnLyqwC33WUP0k0kT88Y5wh3/h98blw=
 =tej8
 -----END PGP SIGNATURE-----

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

virtio,pc,pci: features, fixes, cleanups

Features:

SR-IOV emulation for pci
virtio-mem-pci support for s390
interleave support for cxl
big endian support for vdpa svq
new QAPI events for vhost-user

Also vIOMMU reset order fixups are in.
Fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

 # -----BEGIN PGP SIGNATURE-----
 #
 # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAme4b8sPHG1zdEByZWRo
 # YXQuY29tAAoJECgfDbjSjVRpHKcIAKPJsVqPdda2dJ7b7FdyRT0Q+uwezXqaGHd4
 # 7Lzih1wsxYNkwIAyPtEb76/21qiS7BluqlUCfCB66R9xWjP5/KfvAFj4/r4AEduE
 # fxAgYzotNpv55zcRbcflMyvQ42WGiZZHC+o5Lp7vDXUP3pIyHrl0Ydh5WmcD+hwS
 # BjXvda58TirQpPJ7rUL+sSfLih17zQkkDcfv5/AgorDy1wK09RBKwMx/gq7wG8yJ
 # twy8eBY2CmfmFD7eTM+EKqBD2T0kwLEeLfS/F/tl5Fyg6lAiYgYtCbGLpAmWErsg
 # XZvfZmwqL7CNzWexGvPFnnLyqwC33WUP0k0kT88Y5wh3/h98blw=
 # =tej8
 # -----END PGP SIGNATURE-----
 # gpg: Signature made Fri 21 Feb 2025 20:21:31 HKT
 # gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
 # gpg:                issuer "mst@redhat.com"
 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
 # gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
 # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
 #      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (41 commits)
  docs/devel/reset: Document reset expectations for DMA and IOMMU
  hw/vfio/common: Add a trace point in vfio_reset_handler
  hw/arm/smmuv3: Move reset to exit phase
  hw/i386/intel-iommu: Migrate to 3-phase reset
  hw/virtio/virtio-iommu: Migrate to 3-phase reset
  vhost-user-snd: correct the calculation of config_size
  net: vhost-user: add QAPI events to report connection state
  hw/virtio/virtio-nsm: Respond with correct length
  vdpa: Fix endian bugs in shadow virtqueue
  MAINTAINERS: add more files to `vhost`
  cryptodev/vhost: allocate CryptoDevBackendVhost using g_mem0()
  vhost-iova-tree: Update documentation
  vhost-iova-tree, svq: Implement GPA->IOVA & partial IOVA->HVA trees
  vhost-iova-tree: Implement an IOVA-only tree
  amd_iommu: Use correct bitmask to set capability BAR
  amd_iommu: Use correct DTE field for interrupt passthrough
  hw/virtio: reset virtio balloon stats on machine reset
  mem/cxl_type3: support 3, 6, 12 and 16 interleave ways
  hw/mem/cxl_type3: Ensure errp is set on realization failure
  hw/mem/cxl_type3: Fix special_ops memory leak on msix_init_exclusive_bar() failure
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-02-22 05:06:39 +08:00
Eric Auger
dd6d545e8f docs/devel/reset: Document reset expectations for DMA and IOMMU
To avoid any translation faults, the IOMMUs are expected to be
reset after the devices they protect. Document that we expect
DMA requests to be stopped during the 'enter' or 'hold' phase
while IOMMUs should be reset during the 'exit' phase.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20250218182737.76722-6-eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:21:25 -05:00
Eric Auger
d410e70952 hw/vfio/common: Add a trace point in vfio_reset_handler
To ease the debug of reset sequence, let's add a trace point
in vfio_reset_handler()

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20250218182737.76722-5-eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:21:25 -05:00
Eric Auger
e39e3f8b8d hw/arm/smmuv3: Move reset to exit phase
Currently the iommu may be reset before the devices
it protects. For example this happens with virtio-scsi-pci.
when system_reset is issued from qmp monitor: spurious
"virtio: zero sized buffers are not allowed" warnings can
be observed. This happens because outstanding DMA requests
are still happening while the SMMU gets reset.

This can also happen with VFIO devices. In that case
spurious DMA translation faults can be observed on host.

Make sure the SMMU is reset in the 'exit' phase after
all DMA capable devices have been reset during the 'enter'
or 'hold' phase.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

Message-Id: <20250218182737.76722-4-eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:21:25 -05:00
Eric Auger
2aaf48bcf2 hw/i386/intel-iommu: Migrate to 3-phase reset
Currently the IOMMU may be reset before the devices
it protects. For example this happens with virtio devices
but also with VFIO devices. In this latter case this
produces spurious translation faults on host.

Let's use 3-phase reset mechanism and reset the IOMMU on
exit phase after all DMA capable devices have been reset
on 'enter' or 'hold' phase.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Zhenzhong Duan <zhenzhong.duan@intel.com>

Message-Id: <20250218182737.76722-3-eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:21:25 -05:00
Eric Auger
d261b84d35 hw/virtio/virtio-iommu: Migrate to 3-phase reset
Currently the iommu may be reset before the devices
it protects. For example this happens with virtio-net.

Let's use 3-phase reset mechanism and reset the IOMMU on
exit phase after all DMA capable devices have been
reset during the 'enter' or 'hold' phase.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>

Message-Id: <20250218182737.76722-2-eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:21:25 -05:00
Matias Ezequiel Vara Larsen
e87b6efb11 vhost-user-snd: correct the calculation of config_size
Use virtio_get_config_size() rather than sizeof(struct
virtio_snd_config) for the config_size in the vhost-user-snd frontend.
The frontend shall rely on device features for the size of the device
configuration space. The presence of `controls` in the config space
depends on VIRTIO_SND_F_CTLS according to the specification (v1.3):
`
5.14.4 Device Configuration Layout
...

controls
(driver-read-only) indicates a total number of all available control
elements if VIRTIO_SND_F_CTLS has been negotiated.
`
This fixes an issue introduced by commit ab0c7fb2 ("linux-headers:
update to current kvm/next") in which the optional field `controls` is
added to the virtio_snd_config structure. This breaks vhost-user-device
backends that do not implement the `controls` field.

Fixes: ab0c7fb22b ("linux-headers: update to current kvm/next")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2805
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Matias Ezequiel Vara Larsen <mvaralar@redhat.com>
Message-Id: <20250217131255.829892-1-mvaralar@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Dorinda Bassey <dbassey@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:21:25 -05:00
Laurent Vivier
02fd9f8aee net: vhost-user: add QAPI events to report connection state
The netdev reports NETDEV_VHOST_USER_CONNECTED event when
the chardev is connected, and NETDEV_VHOST_USER_DISCONNECTED
when it is disconnected.

The NETDEV_VHOST_USER_CONNECTED event includes the chardev id.

This allows a system manager like libvirt to detect when the server
fails.

For instance with passt:

{ 'execute': 'qmp_capabilities' }
{ "return": { } }

[killing passt here]

{ "timestamp": { "seconds": 1739538634, "microseconds": 920450 },
  "event": "NETDEV_VHOST_USER_DISCONNECTED",
  "data": { "netdev-id": "netdev0" } }

[automatic reconnection with reconnect-ms]

{ "timestamp": { "seconds": 1739538638, "microseconds": 354181 },
  "event": "NETDEV_VHOST_USER_CONNECTED",
  "data": { "netdev-id": "netdev0", "chardev-id": "chr0" } }

Tested-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20250217092550.1172055-1-lvivier@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:21:25 -05:00
Alexander Graf
131fe64e63 hw/virtio/virtio-nsm: Respond with correct length
When we return a response packet from NSM, we need to indicate its
length according to the content of the response. Prior to this patch, we
returned the length of the source buffer, which may confuse guest code
that relies on the response size.

Fix it by returning the response payload size instead.

Fixes: bb154e3e0c ("device/virtio-nsm: Support for Nitro Secure Module device")
Reported-by: Vikrant Garg <vikrant1garg@gmail.com>
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20250213114541.67515-1-graf@amazon.com>
Reviewed-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Fixes: bb154e3e0c (&quot;device/virtio-nsm: Support for Nitro Secure Module device&quot;)<br>
Reported-by: Vikrant Garg <vikrant1garg@gmail.com>
Signed-off-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Vikrant Garg <vikrant1garg@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:20:52 -05:00
Konstantin Shkolnyy
50e9754149 vdpa: Fix endian bugs in shadow virtqueue
VDPA didn't work on a big-endian machine due to missing/incorrect
CPU<->LE data format conversions.

Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com>
Message-Id: <20250212164923.1971538-1-kshk@linux.ibm.com>
Fixes: 10857ec0ad ("vhost: Add VhostShadowVirtqueue")
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:18:42 -05:00
Stefano Garzarella
590de17b82 MAINTAINERS: add more files to vhost
While sending a patch for backends/cryptodev-vhost.c I noticed that
Michael wasn`t in CC so I took a look at the files listed under `vhost`
and tried to fix it increasing the coverage by adding new files.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20250211144259.117910-1-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:18:42 -05:00
Stefano Garzarella
83cb18ac45 cryptodev/vhost: allocate CryptoDevBackendVhost using g_mem0()
The function `vhost_dev_init()` expects the `struct vhost_dev`
(passed as a parameter) to be fully initialized. This is important
because some parts of the code check whether `vhost_dev->config_ops`
is NULL to determine if it has been set (e.g. later via
`vhost_dev_set_config_notifier`).

To ensure this initialization, it’s better to allocate the entire
`CryptoDevBackendVhost` structure (which includes `vhost_dev`) using
`g_mem0()`, following the same approach used for other vhost devices,
such as in `vhost_net_init()`.

Fixes: 042cea274c ("cryptodev: add vhost-user as a new cryptodev backend")
Cc: qemu-stable@nongnu.org
Reported-by: myluo24@m.fudan.edu.cn
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20250211135523.101203-1-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:18:42 -05:00
Jonah Palmer
332859dd59 vhost-iova-tree: Update documentation
Minor update to some of the documentation / comments in
hw/virtio/vhost-iova-tree.c.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20250217144936.3589907-4-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:18:42 -05:00
Jonah Palmer
05063f5584 vhost-iova-tree, svq: Implement GPA->IOVA & partial IOVA->HVA trees
Creates and supports a GPA->IOVA tree and a partial IOVA->HVA tree by
splitting up guest-backed memory maps and host-only memory maps from the
full IOVA->HVA tree. That is, any guest-backed memory maps are now
stored in the GPA->IOVA tree and host-only memory maps stay in the
IOVA->HVA tree.

Also propagates the GPAs (in_addr/out_addr) of a VirtQueueElement to
vhost_svq_translate_addr() to translate GPAs to IOVAs via the GPA->IOVA
tree (when descriptors are backed by guest memory). For descriptors
backed by host-only memory, the existing partial SVQ IOVA->HVA tree is
used.

GPAs are unique in the guest's address space, ensuring unambiguous IOVA
translations. This avoids the issue where different GPAs map to the same
HVA, causing the original HVA->IOVA translation to potentially return an
IOVA associated with the wrong intended GPA.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20250217144936.3589907-3-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:18:42 -05:00
Jonah Palmer
92cf61e708 vhost-iova-tree: Implement an IOVA-only tree
Creates and supports an IOVA-only tree to support a SVQ IOVA->HVA and
GPA->IOVA tree for host-only and guest-backed memory, respectively, in
the next patch.

The IOVA allocator still allocates an IOVA range but now adds this range
to the IOVA-only tree as well as to the full IOVA->HVA tree.

In the next patch, the full IOVA->HVA tree will be split into a partial
SVQ IOVA->HVA tree and a GPA->IOVA tree. The motivation behind having an
IOVA-only tree was to have a single tree that would keep track of all
allocated IOVA ranges between the partial SVQ IOVA->HVA and GPA->IOVA
trees.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20250217144936.3589907-2-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:18:42 -05:00
Sairaj Kodilkar
3684717b74 amd_iommu: Use correct bitmask to set capability BAR
AMD IOMMU provides the base address of control registers through
IVRS table and PCI capability. Since this base address is of 64 bit,
use 32 bits mask (instead of 16 bits) to set BAR low and high.

Fixes: d29a09ca68 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250207045354.27329-3-sarunkod@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:18:42 -05:00
Sairaj Kodilkar
63dc0b8647 amd_iommu: Use correct DTE field for interrupt passthrough
Interrupt passthrough is determine by the bits 191,190,187-184.
These bits are part of the 3rd quad word (i.e. index 2) in DTE. Hence
replace dte[3] by dte[2].

Fixes: b44159fe0 ("x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled")
Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250207045354.27329-2-sarunkod@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:18:42 -05:00
Daniel P. Berrangé
1456e90653 hw/virtio: reset virtio balloon stats on machine reset
When a machine is first booted, all virtio balloon stats are initialized
to their default value -1 (18446744073709551615 when represented as
unsigned).

They remain that way while the firmware is loading, and early phase of
guest OS boot, until the virtio-balloon driver is activated. Thereafter
the reported stats reflect the guest OS activity.

When a machine reset is performed, however, the virtio-balloon stats are
left unchanged by QEMU, despite the guest OS no longer updating them,
nor indeed even still existing.

IOW, the mgmt app keeps getting stale stats until the guest OS starts
once more and loads the virtio-balloon driver (if ever). At that point
the app will see a discontinuity in the reported values as they sudden
jump from the stale value to the new value. This jump is indigituishable
from a valid data update.

While there is an "last-updated" field to report on the freshness of
the stats, that does not unambiguously tell the mgmt app whether the
stats are still conceptually relevant to the current running workload.

It is more conceptually useful to reset the stats to their default
values on machine reset, given that the previous guest workload the
stats reflect no longer exists. The mgmt app can now clearly identify
that there are is no stats information available from the current
executing workload.

The 'last-updated' time is also reset back to 0.

IOW, on every machine reset, the virtio stats are in the same clean
state they were when the macine first powered on.

A functional test is added to validate this behaviour with a real
world guest OS.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20250204094202.2183262-1-berrange@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:18:42 -05:00
Yao Xingtao
9ac2c42f43 mem/cxl_type3: support 3, 6, 12 and 16 interleave ways
Since the kernel does not check the interleave capability, a
3-way, 6-way, 12-way or 16-way region can be create normally.

Applications can access the memory of 16-way region normally because
qemu can convert hpa to dpa correctly for the power of 2 interleave
ways, after kernel implementing the check, this kind of region will
not be created any more.

For non power of 2 interleave ways, applications could not access the
memory normally and may occur some unexpected behaviors, such as
segmentation fault.

So implements this feature is needed.

Link: https://lore.kernel.org/linux-cxl/3e84b919-7631-d1db-3e1d-33000f3f3868@fujitsu.com/
Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250203161908.145406-6-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-21 07:18:42 -05:00