Add new -shim command line option, wire up for the x86 loader.
When specified load shim into the new "etc/boot/shim" fw_cfg file.
Needs OVMF changes too to be actually useful.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20240905141211.1253307-6-kraxel@redhat.com>
Add a new "etc/boot/kernel" fw_cfg file, containing the kernel without
the setup header patches. Intended use is booting in UEFI with secure
boot enabled, where the setup header patching breaks secure boot
verification.
Needs OVMF changes too to be actually useful.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20240905141211.1253307-5-kraxel@redhat.com>
Load the complete kernel (including setup) into memory. Excluding the
setup is handled later when adding the FW_CFG_KERNEL_SIZE and
FW_CFG_KERNEL_DATA entries.
This is a preparation for the next patch which adds a new fw_cfg file
containing the complete, unpatched kernel. No functional change.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20240905141211.1253307-4-kraxel@redhat.com>
If the binary loaded via -kernel is *not* a linux kernel (in which
case protocol == 0), do not patch the linux kernel header fields.
It's (a) pointless and (b) might break binaries by random patching
and (c) changes the binary hash which in turn breaks secure boot
verification.
Background: OVMF happily loads and runs not only linux kernels but
any efi binary via direct kernel boot.
Note: Breaking the secure boot verification is a problem for linux
kernels too, but fixed that is left for another day ...
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20240905141211.1253307-3-kraxel@redhat.com>
We want to remove fw_cfg_add_extra_pci_roots() which introduced
PCI bus knowledge within the generic hw/nvram/fw_cfg.c file.
Replace the calls by the pci_bus_add_fw_cfg_extra_pci_roots()
which is a 1:1 equivalent, but using correct API.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20241206181352.6836-6-philmd@linaro.org>
We were storing the pointers to buffers in a GList due to lack of
stateful crypto apis and instead doing the final hash computation at
the end after we had all the necessary buffers. Now that we have the
stateful qcrypto apis available, we can instead update the hashes
inline in the read_eif_* functions which makes the code much simpler.
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Message-ID: <20241109123039.24180-1-dorjoychy111@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Follow the other architecture targets by adding extra macros for
defining a versioned machine type as the latest. This reduces the
size of the changes when introducing new machine types at the start
of each release cycle.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20240910163041.3764176-1-berrange@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-ID: <20241126103005.3794748-2-cohuck@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Commit b12cb3819 (amd_iommu: Check APIC ID > 255 for XTSup) throws
linking error for the `kvm_enable_x2apic` when kvm is disabled
and Clang is used for compilation.
This issue comes up because Clang does not remove the function callsite
(kvm_enable_x2apic in this case) during optimization when if condition
have variable. Intel IOMMU driver solves this issue by creating separate
if condition for checking variables, which causes call site being
optimized away by virtue of `kvm_irqchip_is_split()` being defined as 0.
Implement same solution for the AMD driver.
Fixes: b12cb3819b (amd_iommu: Check APIC ID > 255 for XTSup)
Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Tested-by: Phil Dennis-Jordan <phil@philjordan.eu>
Link: https://lore.kernel.org/r/20241114114509.15350-1-sarunkod@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is no need for this assertion here, as we only use vmport value
for equality/inequality checks. This was originally prompted by the
following Coverity report:
>>> CID 1559533: Integer handling issues (CONSTANT_EXPRESSION_RESULT)
>>> "pcms->vmport >= 0" is always true regardless of the values of
>>> its operands. This occurs as the logical first operand of "&&".
Signed-off-by: Kamil Szczęk <kamil@szczek.dev>
Reported-By: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/ZwF9ZexNs1h-uC0MrbkgGtMtdyLinROjVSmMNVzNftjGVWgOiuzdD1dSXEtzNH7OHbBFY6GVDYVFIDBgc3lhGqCOb7kaNZolSBkVyl3rNr4=@szczek.dev
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In x86_load_linux(), we were using a stack-allocated array as data for
fw_cfg_add_bytes(). Since the latter just takes a reference to the
pointer instead of copying the data, it can happen that the contents
have been overridden by the time the guest attempts to access them.
Instead of using the stack-allocated array, allocate some memory from
the heap, copy the contents of the array, and use it for fw_cfg.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241109053748.13183-1-slp@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The 'isapc' machine type has no PCI bus, but pc_nic_init() still
calls pci_init_nic_devices() passing it a NULL bus pointer. This
causes the clang sanitizer to complain:
$ ./build/clang/qemu-system-i386 -M isapc
../../hw/pci/pci.c:1866:39: runtime error: member access within null pointer of type 'PCIBus' (aka 'struct PCIBus')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../hw/pci/pci.c:1866:39 in
This is because pci_init_nic_devices() does
&bus->qbus
which is undefined behaviour on a NULL pointer even though we're not
actually dereferencing the pointer. (We don't actually crash as
a result, so if you aren't running a sanitizer build then there
are no user-visible effects.)
Make pc_nic_init() avoid trying to initialize PCI NICs on a non-PCI
system.
Cc: qemu-stable@nongnu.org
Fixes: 8d39f9ba14 ("hw/i386/pc: use qemu_get_nic_info() and pci_init_nic_devices()")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Link: https://lore.kernel.org/r/20241105171813.3031969-1-peter.maydell@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cache topology needs to be defined based on CPU topology levels. Thus,
define CPU topology enumeration in qapi/machine.json to make it generic
for all architectures.
To match the general topology naming style, rename CPU_TOPO_LEVEL_* to
CPU_TOPOLOGY_LEVEL_*, and rename SMT and package levels to thread and
socket.
Also, enumerate additional topology levels for non-i386 arches, and add
a CPU_TOPOLOGY_LEVEL_DEFAULT to help future smp-cache object to work
with compatibility requirement of arch-specific cache topology models.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241101083331.340178-3-zhao1.liu@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
CXL now can use Generic Port Affinity Structures.
CXL now allows control of link speed and width
vhost-user-blk now supports live resize, by means of
a new device-sync-config command
amd iommu now supports interrupt remapping
pcie devices now report extended tag field support
intel_iommu dropped support for Transient Mapping, to match VTD spec
arch agnostic ACPI infrastructure for vCPU Hotplug
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmcpNqUPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp/2oH/0qO33prhDa48J5mqT9NuJzzYwp5QHKF9Zjv
fDAplMUEmfxZIEgJchcyDWPYTGX2geT4pCFhRWioZMIR/0JyzrFgSwsk1kL88cMh
46gzhNVD6ybyPJ7O0Zq3GLy5jo7rlw/n+fFxKAuRCzcbK/fmH8gNC+RwW1IP64Na
HDczYilHUhnO7yKZFQzQNQVbK4BckrG1bu0Fcx0EMUQBf4V6x7GLOrT+3hkKYcr6
+DG5DmUmv20or/FXnu2Ye+MzR8Ebx6JVK3A3sXEE4Ns2CCzK9QLzeeyc2aU13jWN
OpZ6WcKF8HqYprIwnSsMTxhPcq0/c7TvrGrazVwna5RUBMyjjvc=
=zSX4
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pc,pci: features, fixes, cleanups
CXL now can use Generic Port Affinity Structures.
CXL now allows control of link speed and width
vhost-user-blk now supports live resize, by means of
a new device-sync-config command
amd iommu now supports interrupt remapping
pcie devices now report extended tag field support
intel_iommu dropped support for Transient Mapping, to match VTD spec
arch agnostic ACPI infrastructure for vCPU Hotplug
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmcpNqUPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRp/2oH/0qO33prhDa48J5mqT9NuJzzYwp5QHKF9Zjv
# fDAplMUEmfxZIEgJchcyDWPYTGX2geT4pCFhRWioZMIR/0JyzrFgSwsk1kL88cMh
# 46gzhNVD6ybyPJ7O0Zq3GLy5jo7rlw/n+fFxKAuRCzcbK/fmH8gNC+RwW1IP64Na
# HDczYilHUhnO7yKZFQzQNQVbK4BckrG1bu0Fcx0EMUQBf4V6x7GLOrT+3hkKYcr6
# +DG5DmUmv20or/FXnu2Ye+MzR8Ebx6JVK3A3sXEE4Ns2CCzK9QLzeeyc2aU13jWN
# OpZ6WcKF8HqYprIwnSsMTxhPcq0/c7TvrGrazVwna5RUBMyjjvc=
# =zSX4
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 04 Nov 2024 21:03:33 GMT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (65 commits)
intel_iommu: Add missed reserved bit check for IEC descriptor
intel_iommu: Add missed sanity check for 256-bit invalidation queue
intel_iommu: Send IQE event when setting reserved bit in IQT_TAIL
hw/acpi: Update GED with vCPU Hotplug VMSD for migration
tests/qtest/bios-tables-test: Update DSDT golden masters for x86/{pc,q35}
hw/acpi: Update ACPI `_STA` method with QOM vCPU ACPI Hotplug states
qtest: allow ACPI DSDT Table changes
hw/acpi: Make CPUs ACPI `presence` conditional during vCPU hot-unplug
hw/pci: Add parenthesis to PCI_BUILD_BDF macro
hw/cxl: Ensure there is enough data to read the input header in cmd_get_physical_port_state()
hw/cxl: Ensure there is enough data for the header in cmd_ccls_set_lsa()
hw/cxl: Check that writes do not go beyond end of target attributes
hw/cxl: Ensuring enough data to read parameters in cmd_tunnel_management_cmd()
hw/cxl: Avoid accesses beyond the end of cel_log.
hw/cxl: Check the length of data requested fits in get_log()
hw/cxl: Check enough data in cmd_firmware_update_transfer()
hw/cxl: Check input length is large enough in cmd_events_clear_records()
hw/cxl: Check input includes at least the header in cmd_features_set_feature()
hw/cxl: Check size of input data to dynamic capacity mailbox commands
hw/cxl/cxl-mailbox-util: Fix output buffer index update when retrieving DC extents
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
IEC descriptor is 128-bit invalidation descriptor, must be padded with
128-bits of 0s in the upper bytes to create a 256-bit descriptor when
the invalidation queue is configured for 256-bit descriptors (IQA_REG.DW=1).
Fixes: 02a2cbc872 ("x86-iommu: introduce IEC notifiers")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20241104125536.1236118-4-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
According to VTD spec, a 256-bit descriptor will result in an invalid
descriptor error if submitted in an IQ that is setup to provide hardware
with 128-bit descriptors (IQA_REG.DW=0). Meanwhile, there are old inv desc
types (e.g. iotlb_inv_desc) that can be either 128bits or 256bits. If a
128-bit version of this descriptor is submitted into an IQ that is setup
to provide hardware with 256-bit descriptors will also result in an invalid
descriptor error.
The 2nd will be captured by the tail register update. So we only need to
focus on the 1st.
Because the reserved bit check between different types of invalidation desc
are common, so introduce a common function vtd_inv_desc_reserved_check()
to do all the checks and pass the differences as parameters.
With this change, need to replace error_report_once() call with error_report()
to catch different call sites. This isn't an issue as error_report_once()
here is mainly used to help debug guest error, but it only dumps once in
qemu life cycle and doesn't help much, we need error_report() instead.
Fixes: c0c1d35184 ("intel_iommu: add 256 bits qi_desc support")
Suggested-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20241104125536.1236118-3-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
According to VTD spec, Figure 11-22, Invalidation Queue Tail Register,
"When Descriptor Width (DW) field in Invalidation Queue Address Register
(IQA_REG) is Set (256-bit descriptors), hardware treats bit-4 as reserved
and a value of 1 in the bit will result in invalidation queue error."
Current code missed to send IQE event to guest, fix it.
Fixes: c0c1d35184 ("intel_iommu: add 256 bits qi_desc support")
Suggested-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20241104125536.1236118-2-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
VT-d spec removed Transient Mapping (TM) field from second-level page-tables
and treat the field as Reserved(0) since revision 3.2.
Changing the field as reserved(0) will break backward compatibility, so
introduce a property "stale-tm" to allow user to control the setting.
Use pc_compat_9_1 to handle the compatibility for machines before 9.2 which
allow guest to set the field. Starting from 9.2, this field is reserved(0)
by default to match spec. Of course, user can force it on command line.
This doesn't impact function of vIOMMU as there was no logic to emulate
Transient Mapping.
Suggested-by: Yi Liu <yi.l.liu@intel.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Message-Id: <20241028022514.806657-1-zhenzhong.duan@intel.com>
Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The XTSup mode enables x2APIC support for AMD IOMMU, which is needed
to support vcpu w/ APIC ID > 255.
Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Message-Id: <20240927172913.121477-6-santosh.shukla@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In order to support AMD IOMMU interrupt remapping emulation with PCI
pass-through devices, QEMU needs to notify VFIO when guest IOMMU driver
updates and invalidate the guest interrupt remapping table (IRT), and
communicate information so that the host IOMMU driver can update
the shadowed interrupt remapping table in the host IOMMU.
Therefore, send notification when guest IOMMU emulates the IRT
invalidation commands.
Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Message-Id: <20240927172913.121477-5-santosh.shukla@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Use shared memory region for interrupt remapping which can be
aliased by all devices.
Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Message-Id: <20240927172913.121477-4-santosh.shukla@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce 'nodma' shared memory region to support PT mode
so that for each device, we only create an alias to shared memory
region when DMA-remapping is disabled.
Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Message-Id: <20240927172913.121477-3-santosh.shukla@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Rename the MMIO memory region variable 'mmio' to 'mr_mmio'
so to correctly name align with struct AMDVIState::variable type.
No functional change intended.
Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Message-Id: <20240927172913.121477-2-santosh.shukla@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
These are very similar to the recently added Generic Initiators
but instead of representing an initiator of memory traffic they
represent an edge point beyond which may lie either targets or
initiators. Here we add these ports such that they may
be targets of hmat_lb records to describe the latency and
bandwidth from host side initiators to the port. A discoverable
mechanism such as UEFI CDAT read from CXL devices and switches
is used to discover the remainder of the path, and the OS can build
up full latency and bandwidth numbers as need for work and data
placement decisions.
Acked-by: Markus Armbruster <armbru@redhat.com>
Tested-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916174122.1843197-1-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Rather than relying on PCI internals, use the new acpi_property
to obtain the ACPI _UID values. These are still the same
as the PCI Bus numbers so no functional change.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916171017.1841767-9-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Whilst ACPI SRAT Generic Initiator Afinity Structures are able to refer to
both PCI and ACPI Device Handles, the QEMU implementation only implements
the PCI Device Handle case. For now move the code into the existing
hw/acpi/pci.c file and header. If support for ACPI Device Handles is
added in the future, perhaps this will be moved again.
Also push the struct AcpiGenericInitiator down into the c file as not
used outside pci.c.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916171017.1841767-7-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
AWS nitro enclaves[1] is an Amazon EC2[2] feature that allows creating
isolated execution environments, called enclaves, from Amazon EC2
instances which are used for processing highly sensitive data. Enclaves
have no persistent storage and no external networking. The enclave VMs
are based on the Firecracker microvm with a vhost-vsock device for
communication with the parent EC2 instance that spawned it and a Nitro
Secure Module (NSM) device for cryptographic attestation. The parent
instance VM always has CID 3 while the enclave VM gets a dynamic CID.
An EIF (Enclave Image Format)[3] file is used to boot an AWS nitro enclave
virtual machine. This commit adds support for AWS nitro enclave emulation
using a new machine type option '-M nitro-enclave'. This new machine type
is based on the 'microvm' machine type, similar to how real nitro enclave
VMs are based on Firecracker microvm. For nitro-enclave to boot from an
EIF file, the kernel and ramdisk(s) are extracted into a temporary kernel
and a temporary initrd file which are then hooked into the regular x86
boot mechanism along with the extracted cmdline. The EIF file path should
be provided using the '-kernel' QEMU option.
In QEMU, the vsock emulation for nitro enclave is added using vhost-user-
vsock as opposed to vhost-vsock. vhost-vsock doesn't support sibling VM
communication which is needed for nitro enclaves. So for the vsock
communication to CID 3 to work, another process that does the vsock
emulation in userspace must be run, for example, vhost-device-vsock[4]
from rust-vmm, with necessary vsock communication support in another
guest VM with CID 3. Using vhost-user-vsock also enables the possibility
to implement some proxying support in the vhost-user-vsock daemon that
will forward all the packets to the host machine instead of CID 3 so
that users of nitro-enclave can run the necessary applications in their
host machine instead of running another whole VM with CID 3. The following
mandatory nitro-enclave machine option has been added related to the
vhost-user-vsock device.
- 'vsock': The chardev id from the '-chardev' option for the
vhost-user-vsock device.
AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which
has been added using the virtio-nsm device added in a previous commit.
In Nitro Enclaves, all the PCRs start in a known zero state and the first
16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8
contain the SHA384 hashes related to the EIF file used to boot the VM
for validation. The following optional nitro-enclave machine options
have been added related to the NSM device.
- 'id': Enclave identifier, reflected in the module-id of the NSM
device. If not provided, a default id will be set.
- 'parent-role': Parent instance IAM role ARN, reflected in PCR3
of the NSM device.
- 'parent-id': Parent instance identifier, reflected in PCR4 of the
NSM device.
[1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
[2] https://aws.amazon.com/ec2/
[3] https://github.com/aws/aws-nitro-enclaves-image-format
[4] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-6-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The x86 architecture uses little endianness. Directly use
the little-endian LD/ST API.
Mechanical change using:
$ end=le; \
for acc in uw w l q tul; do \
sed -i -e "s/ld${acc}_p(/ld${acc}_${end}_p(/" \
-e "s/st${acc}_p(/st${acc}_${end}_p(/" \
$(git grep -wlE '(ld|st)t?u?[wlq]_p' hw/i386/); \
done
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20241004163042.85922-9-philmd@linaro.org>
* kvm: support for nested FRED
* tests/unit: fix warning when compiling test-nested-aio-poll with LTO
* kvm: refactoring of VM creation
* target/i386: expose IBPB-BRTYPE and SBPB CPUID bits to the guest
* hw/char: clean up serial
* remove virtfs-proxy-helper
* target/i386/kvm: Report which action failed in kvm_arch_put/get_registers
* qom: improvements to object_resolve_path*()
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmb++MsUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPVnwf/cdvfxvDm22tEdlh8vHlV17HtVdcC
Hw334M/3PDvbTmGzPBg26lzo4nFS6SLrZ8ETCeqvuJrtKzqVk9bI8ssZW5KA4ijM
nkxguRPHO8E6U33ZSucc+Hn56+bAx4I2X80dLKXJ87OsbMffIeJ6aHGSEI1+fKVh
pK7q53+Y3lQWuRBGhDIyKNuzqU4g+irpQwXOhux63bV3ADadmsqzExP6Gmtl8OKM
DylPu1oK7EPZumlSiJa7Gy1xBqL4Rc4wGPNYx2RVRjp+i7W2/Y1uehm3wSBw+SXC
a6b7SvLoYfWYS14/qCF4cBL3sJH/0f/4g8ZAhDDxi2i5kBr0/5oioDyE/A==
=/zo4
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* pc: Add a description for the i8042 property
* kvm: support for nested FRED
* tests/unit: fix warning when compiling test-nested-aio-poll with LTO
* kvm: refactoring of VM creation
* target/i386: expose IBPB-BRTYPE and SBPB CPUID bits to the guest
* hw/char: clean up serial
* remove virtfs-proxy-helper
* target/i386/kvm: Report which action failed in kvm_arch_put/get_registers
* qom: improvements to object_resolve_path*()
# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmb++MsUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroPVnwf/cdvfxvDm22tEdlh8vHlV17HtVdcC
# Hw334M/3PDvbTmGzPBg26lzo4nFS6SLrZ8ETCeqvuJrtKzqVk9bI8ssZW5KA4ijM
# nkxguRPHO8E6U33ZSucc+Hn56+bAx4I2X80dLKXJ87OsbMffIeJ6aHGSEI1+fKVh
# pK7q53+Y3lQWuRBGhDIyKNuzqU4g+irpQwXOhux63bV3ADadmsqzExP6Gmtl8OKM
# DylPu1oK7EPZumlSiJa7Gy1xBqL4Rc4wGPNYx2RVRjp+i7W2/Y1uehm3wSBw+SXC
# a6b7SvLoYfWYS14/qCF4cBL3sJH/0f/4g8ZAhDDxi2i5kBr0/5oioDyE/A==
# =/zo4
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 03 Oct 2024 21:04:27 BST
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (23 commits)
qom: update object_resolve_path*() documentation
qom: set *ambiguous on all paths
qom: rename object_resolve_path_type() "ambiguousp"
target/i386/kvm: Report which action failed in kvm_arch_put/get_registers
kvm: Allow kvm_arch_get/put_registers to accept Error**
accel/kvm: refactor dirty ring setup
minikconf: print error entirely on stderr
9p: remove 'proxy' filesystem backend driver
hw/char: Extract serial-mm
hw/char/serial.h: Extract serial-isa.h
hw: Remove unused inclusion of hw/char/serial.h
target/i386: Expose IBPB-BRTYPE and SBPB CPUID bits to the guest
kvm: refactor core virtual machine creation into its own function
kvm/i386: replace identity_base variable with a constant
kvm/i386: refactor kvm_arch_init and split it into smaller functions
kvm: replace fprintf with error_report()/printf() in kvm_init()
kvm/i386: fix return values of is_host_cpu_intel()
kvm/i386: make kvm_filter_msr() and related definitions private to kvm module
hw/i386/pc: Add a description for the i8042 property
tests/unit: remove block layer code from test-nested-aio-poll
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# hw/arm/Kconfig
# hw/arm/pxa2xx.c
Expose handle_bufioreq in xen_register_ioreq().
This is to allow machines to enable or disable buffered ioreqs.
No functional change since all callers still set it to
HVM_IOREQSRV_BUFIOREQ_ATOMIC.
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
The includes where updated based on compile errors. Now, the inclusion of the
header roughly matches Kconfig dependencies:
# grep -r -e "select SERIAL_ISA"
hw/ppc/Kconfig: select SERIAL_ISA
hw/isa/Kconfig: select SERIAL_ISA
hw/sparc64/Kconfig: select SERIAL_ISA
hw/i386/Kconfig: select SERIAL_ISA
hw/i386/Kconfig: select SERIAL_ISA # for serial_hds_isa_init()
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Link: https://lore.kernel.org/r/20240905073832.16222-3-shentey@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some devices need to distinguish cold start reset from waking up from a
suspended state. This patch adds new value to the enum, and updates the
i386 wakeup method to use this new reset type.
Message-ID: <20240904103722.946194-3-jmarcin@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Currently, both qemu_devices_reset() and MachineClass::reset() use
ShutdownCause for the reason of the reset. However, the Resettable
interface uses ResetState, so ShutdownCause needs to be translated to
ResetType somewhere. Translating it qemu_devices_reset() makes adding
new reset types harder, as they cannot always be matched to a single
ShutdownCause here, and devices may need to check the ResetType to
determine what to reset and if to reset at all.
This patch moves this translation up in the call stack to
qemu_system_reset() and updates all MachineClass children to use the
ResetType instead.
Message-ID: <20240904103722.946194-2-jmarcin@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Use device_class_set_legacy_reset() instead of opencoding an
assignment to DeviceClass::reset. This change was produced
with:
spatch --macro-file scripts/cocci-macro-file.h \
--sp-file scripts/coccinelle/device-reset.cocci \
--keep-comments --smpl-spacing --in-place --dir hw
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240830145812.1967042-8-peter.maydell@linaro.org
This patch implements the periodic and the swsmi ICH9 chipset timers. They are
especially useful when prototyping UEFI firmware (e.g. with EDK2's OVMF)
using QEMU.
For backwards compatibility, the compat properties "x-smi-swsmi-timer",
and "x-smi-periodic-timer" are introduced.
Additionally, writes to the SMI_STS register are enabled for the
corresponding two bits using a write mask to make future work easier.
Signed-off-by: Dominic Prinz <git@dprinz.de>
Message-Id: <1d90ea69e01ab71a0f2ced116801dc78e04f4448.1725991505.git.git@dprinz.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When qemu runs without kvm acceleration the ACPI executions take a great
amount of time. If they take more than the default time (30sec), the
ACPI calls fail and the system might not behave correctly.
Now the _PRT table is computed on the fly. We can drastically reduce the
execution of the _PRT method if we return a pre-computed table.
Without this patch:
[ 51.343484] ACPI Error: Aborting method \_SB.PCI0._PRT due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
[ 51.527032] ACPI Error: Method execution failed \_SB.PCI0._PRT due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/uteval-68)
[ 51.530049] virtio-pci 0000:00:02.0: can't derive routing for PCI INT A
[ 51.530797] virtio-pci 0000:00:02.0: PCI INT A: no GSI
[ 81.922901] ACPI Error: Aborting method \_SB.PCI0._PRT due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
[ 82.103534] ACPI Error: Method execution failed \_SB.PCI0._PRT due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/uteval-68)
[ 82.106088] virtio-pci 0000:00:04.0: can't derive routing for PCI INT A
[ 82.106761] virtio-pci 0000:00:04.0: PCI INT A: no GSI
[ 112.192568] ACPI Error: Aborting method \_SB.PCI0._PRT due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
[ 112.486687] ACPI Error: Method execution failed \_SB.PCI0._PRT due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/uteval-68)
[ 112.489554] virtio-pci 0000:00:05.0: can't derive routing for PCI INT A
[ 112.490027] virtio-pci 0000:00:05.0: PCI INT A: no GSI
[ 142.559448] ACPI Error: Aborting method \_SB.PCI0._PRT due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
[ 142.718596] ACPI Error: Method execution failed \_SB.PCI0._PRT due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/uteval-68)
[ 142.722889] virtio-pci 0000:00:06.0: can't derive routing for PCI INT A
[ 142.724578] virtio-pci 0000:00:06.0: PCI INT A: no GSI
With this patch:
[ 22.938076] ACPI: \_SB_.LNKB: Enabled at IRQ 10
[ 24.214002] ACPI: \_SB_.LNKD: Enabled at IRQ 11
[ 25.465170] ACPI: \_SB_.LNKA: Enabled at IRQ 10
[ 27.944920] ACPI: \_SB_.LNKC: Enabled at IRQ 11
ACPI disassembly:
Scope (PCI0)
{
Method (_PRT, 0, NotSerialized) // _PRT: PCI Routing Table
{
Return (Package (0x80)
{
Package (0x04)
{
0xFFFF,
Zero,
LNKD,
Zero
},
Package (0x04)
{
0xFFFF,
One,
LNKA,
Zero
},
Package (0x04)
{
0xFFFF,
0x02,
LNKB,
Zero
},
Package (0x04)
{
0xFFFF,
0x03,
LNKC,
Zero
},
Package (0x04)
{
0x0001FFFF,
Zero,
LNKS,
Zero
},
Context: https://lore.kernel.org/virtualization/20240417145544.38d7b482@imammedo.users.ipa.redhat.com/T/#t
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240814115736.1580337-3-ribalda@chromium.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In vtd_process_inv_desc(), VTD_INV_DESC_PC and VTD_INV_DESC_PIOTLB are
bypassed without scalable mode check. These two types are not valid
in legacy mode and we should report error.
Fixes: 4a4f219e8a ("intel_iommu: add scalable-mode option to make scalable mode work")
Suggested-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Message-Id: <20240814071321.2621384-3-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
According to spec, invalidation descriptor type is 7bits which is
concatenation of bits[11:9] and bits[3:0] of invalidation descriptor.
Currently we only pick bits[3:0] as the invalidation type and treat
bits[11:9] as reserved zero. This is not a problem for now as bits[11:9]
is zero for all current invalidation types. But it will break if newer
type occupies bits[11:9].
Fix it by taking bits[11:9] into type and make reserved bits check accurate.
Suggested-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com>
Message-Id: <20240814071321.2621384-2-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add a Xen PVH x86 machine based on the abstract PVH Machine.
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Cosmetic: add comments in x86_load_linux() pointing to the kernel documentation
so that users can better understand the code.
CC: qemu-trivial@nongnu.org
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Since commit 4ccd5fe22f ('pc: add option
to disable PS/2 mouse/keyboard'), the vmport will not be created unless
the i8042 PS/2 controller is enabled. To avoid confusion, let's fail if
vmport was explicitly requested, but the i8042 controller is disabled.
This also changes the behavior of vmport=auto to take i8042 controller
availability into account.
Signed-off-by: Kamil Szczęk <kamil@szczek.dev>
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Message-ID: <0MS3y5E-hHqODIhiuFxmCnIrXd612JIGq31UuMsz4KGCKZ_wWuF-PHGKTRSGS0nWaPEddOdF4YOczHdgorulECPo792OhWov7O9BBF6UMX4=@szczek.dev>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The code which translates vmport=auto to on/off is currently separate
for each PC machine variant, while being functionally equivalent.
This moves the translation into a shared initialization function, while
also tightening the enum assertion.
Signed-off-by: Kamil Szczęk <kamil@szczek.dev>
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <v8pz1uwgIYWkidgZK-o8H-qJvnSyl0641XVmNO43Qls307AA3QRPuad_py6xGe0JAxB6yDEe76oZ8tau_n-2Y6sJBCKzCujNbEUUFhd-ahI=@szczek.dev>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>