"Host Memory Backends" and "Memory devices" queue ("mem"):
- Fixup handling of virtio-mem unplug during system resets, as
preparation for s390x support (especially kdump in the Linux guest)
- virtio-mem support for s390x
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmdnFD4RHGRhdmlkQHJl
ZGhhdC5jb20ACgkQTd4Q9wD/g1rWBBAAp7WkYaNAjRy1PgpjNZ3z1gUJc/vk+skJ
xVgGodA8txrJOFpNrbTyfhrdLs2TV4oWDvB/zrZRRtuxvur3O1EhFd9k6EqXuydr
0FunvLvVJwRHfEZycjN4aacQMRH3CJw07OaTzexeSl5UR/6w5PRofwUK4HX7W/Ka
arqomGa3OJrs1+WgkV0Qcn4vh9HLRVv3iNC2Xo4W1wOCr1Du9zSPn9oC7zOQ0EO4
ZC//7QsdkNRjUX/yMXMkhlSXx3b/RmRg2DBrxo7BZXg27VwGu4uHxL4LRBZiB2A7
V9MqFOcVKzPMkXKTRjrgZ0vXQx1MPJ6WprEihMzMpYU6DrpA7KN/l8Ca8H24B2ln
h7+bmkDsHVVcWovE9ii/9cMRfws6uWXXg3KoA8RQ8IbX1tU02lblw2uHhXEzcoge
npqp/Z5LAiKVMetEnNnLH5thjut5PAEjuqD00cmZAMy4DNngLX2bGSdzMeVBkDMa
78ehLGRplm3t7ibUfaZaMKe6UD9tFrcD6XKsvUTXXHNbYO8ynbx58WOxSZmY98zU
n3JNQRqtXYjBVlH3Dqm47vOTZHgOzFv3raa8BmSLpcBDeTXCTcUIl20s77dGw/vT
r5YNCMN7O4YPFKUoRK9604QTgw6qlYaRTQlJD09usprGqVylb6gQtfZZuZkYDMp8
sEI77QHsePA=
=HDxr
-----END PGP SIGNATURE-----
Merge tag 'mem-2024-12-21' of https://github.com/davidhildenbrand/qemu into staging
Hi,
"Host Memory Backends" and "Memory devices" queue ("mem"):
- Fixup handling of virtio-mem unplug during system resets, as
preparation for s390x support (especially kdump in the Linux guest)
- virtio-mem support for s390x
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmdnFD4RHGRhdmlkQHJl
# ZGhhdC5jb20ACgkQTd4Q9wD/g1rWBBAAp7WkYaNAjRy1PgpjNZ3z1gUJc/vk+skJ
# xVgGodA8txrJOFpNrbTyfhrdLs2TV4oWDvB/zrZRRtuxvur3O1EhFd9k6EqXuydr
# 0FunvLvVJwRHfEZycjN4aacQMRH3CJw07OaTzexeSl5UR/6w5PRofwUK4HX7W/Ka
# arqomGa3OJrs1+WgkV0Qcn4vh9HLRVv3iNC2Xo4W1wOCr1Du9zSPn9oC7zOQ0EO4
# ZC//7QsdkNRjUX/yMXMkhlSXx3b/RmRg2DBrxo7BZXg27VwGu4uHxL4LRBZiB2A7
# V9MqFOcVKzPMkXKTRjrgZ0vXQx1MPJ6WprEihMzMpYU6DrpA7KN/l8Ca8H24B2ln
# h7+bmkDsHVVcWovE9ii/9cMRfws6uWXXg3KoA8RQ8IbX1tU02lblw2uHhXEzcoge
# npqp/Z5LAiKVMetEnNnLH5thjut5PAEjuqD00cmZAMy4DNngLX2bGSdzMeVBkDMa
# 78ehLGRplm3t7ibUfaZaMKe6UD9tFrcD6XKsvUTXXHNbYO8ynbx58WOxSZmY98zU
# n3JNQRqtXYjBVlH3Dqm47vOTZHgOzFv3raa8BmSLpcBDeTXCTcUIl20s77dGw/vT
# r5YNCMN7O4YPFKUoRK9604QTgw6qlYaRTQlJD09usprGqVylb6gQtfZZuZkYDMp8
# sEI77QHsePA=
# =HDxr
# -----END PGP SIGNATURE-----
# gpg: Signature made Sat 21 Dec 2024 14:17:18 EST
# gpg: using RSA key 1BD9CAAD735C4C3A460DFCCA4DDE10F700FF835A
# gpg: issuer "david@redhat.com"
# gpg: Good signature from "David Hildenbrand <david@redhat.com>" [unknown]
# gpg: aka "David Hildenbrand <davidhildenbrand@gmail.com>" [full]
# gpg: aka "David Hildenbrand <hildenbr@in.tum.de>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 1BD9 CAAD 735C 4C3A 460D FCCA 4DDE 10F7 00FF 835A
* tag 'mem-2024-12-21' of https://github.com/davidhildenbrand/qemu:
s390x: virtio-mem support
s390x/virtio-ccw: add support for virtio based memory devices
s390x: remember the maximum page size
s390x/pv: prepare for memory devices
s390x/s390-virtio-ccw: prepare for memory devices
s390x/s390-skeys: prepare for memory devices
s390x/s390-stattrib-kvm: prepare for memory devices and sparse memory layouts
s390x/s390-hypercall: introduce DIAG500 STORAGE_LIMIT
s390x: introduce s390_get_memory_limit()
s390x/s390-virtio-ccw: move setting the maximum guest size from sclp to machine code
s390x: rename s390-virtio-hcall* to s390-hypercall*
s390x/s390-virtio-hcall: prepare for more diag500 hypercalls
s390x/s390-virtio-hcall: remove hypercall registration mechanism
s390x/s390-virtio-ccw: don't crash on weird RAM sizes
virtio-mem: unplug memory only during system resets, not device resets
Conflicts:
- hw/s390x/s390-stattrib-kvm.c
sysemu/ -> system/ header rename conflict.
- hw/s390x/virtio-ccw-mem.c
Make Property array const and removed DEFINE_PROP_END_OF_LIST() to
conform to the latest conventions.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Let's add our virtio-mem-ccw proxy device and wire it up. We should
be supporting everything (e.g., device unplug, "dynamic-memslots") that
we already support for the virtio-pci variant.
With a Linux guest that supports virtio-mem (and has automatic memory
onlining properly configured) the following example will work:
1. Start a VM with 4G initial memory and a virtio-mem device with a maximum
capacity of 16GB:
qemu/build/qemu-system-s390x \
--enable-kvm \
-m 4G,maxmem=20G \
-nographic \
-smp 8 \
-hda Fedora-Server-KVM-40-1.14.s390x.qcow2 \
-chardev socket,id=monitor,path=/var/tmp/monitor,server,nowait \
-mon chardev=monitor,mode=readline \
-object memory-backend-ram,id=mem0,size=16G,reserve=off \
-device virtio-mem-ccw,id=vmem0,memdev=mem0,dynamic-memslots=on
2. Query the current size of virtio-mem device:
(qemu) info memory-devices
Memory device [virtio-mem]: "vmem0"
memaddr: 0x100000000
node: 0
requested-size: 0
size: 0
max-size: 17179869184
block-size: 1048576
memdev: /objects/mem0
3. Request to grow it to 8GB (hotplug 8GB):
(qemu) qom-set vmem0 requested-size 8G
(qemu) info memory-devices
Memory device [virtio-mem]: "vmem0"
memaddr: 0x100000000
node: 0
requested-size: 8589934592
size: 8589934592
max-size: 17179869184
block-size: 1048576
memdev: /objects/mem0
4. Request to grow to 16GB (hotplug another 8GB):
(qemu) qom-set vmem0 requested-size 16G
(qemu) info memory-devices
Memory device [virtio-mem]: "vmem0"
memaddr: 0x100000000
node: 0
requested-size: 17179869184
size: 17179869184
max-size: 17179869184
block-size: 1048576
memdev: /objects/mem0
5. Try to hotunplug all memory again, shrinking to 0GB:
(qemu) qom-set vmem0 requested-size 0G
(qemu) info memory-devices
Memory device [virtio-mem]: "vmem0"
memaddr: 0x100000000
node: 0
requested-size: 0
size: 0
max-size: 17179869184
block-size: 1048576
memdev: /objects/mem0
6. If it worked, unplug the device
(qemu) device_del vmem0
(qemu) info memory-devices
(qemu) object_del mem0
7. Hotplug a new device with a smaller capacity and directly size it to 1GB
(qemu) object_add memory-backend-ram,id=mem0,size=8G,reserve=off
(qemu) device_add virtio-mem-ccw,id=vmem0,memdev=mem0,\
dynamic-memslots=on,requested-size=1G
(qemu) info memory-devices
Memory device [virtio-mem]: "vmem0"
memaddr: 0x100000000
node: 0
requested-size: 1073741824
size: 1073741824
max-size: 8589934592
block-size: 1048576
memdev: /objects/mem0
Trying to use a virtio-mem device backed by hugetlb into a !hugetlb VM
correctly results in the error:
... Memory device uses a bigger page size than initial memory
Note that the virtio-mem driver in Linux will supports 1 MiB (pageblock)
granularity.
Message-ID: <20241219144115.2820241-15-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's implement support for abstract virtio based memory devices, using
the virtio-pci implementation as an orientation. Wire them up in the
machine hotplug handler, taking care of s390x page size limitations.
As we neither support virtio-mem or virtio-pmem yet, the code is
effectively unused. We'll implement support for virtio-mem based on this
next.
Note that we won't wire up the virtio-pci variant (should currently be
impossible due to lack of support for MSI-X), but we'll add a safety net
to reject plugging them in the pre-plug handler.
Message-ID: <20241219144115.2820241-14-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's remember the value (successfully) set via s390_set_max_pagesize().
This will be helpful to reject hotplugged memory devices that would exceed
this initially set page size.
Handle it just like how we handle s390_get_memory_limit(), storing it in
the machine, and moving the handling to machine code.
Message-ID: <20241219144115.2820241-13-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's prepare our address space for memory devices if enabled via
"maxmem" and if we have CONFIG_MEM_DEVICE enabled at all. Note that
CONFIG_MEM_DEVICE will be selected automatically once we add support
for devices.
Just like on other architectures, the region container for memory devices
is placed directly above our initial memory. For now, we only align the
start address of the region up to 1 GiB, but we won't add any additional
space to the region for internal alignment purposes; this can be done in
the future if really required.
The RAM size returned via SCLP is not modified, as this only
covers initial RAM (and standby memory we don't implement) and not memory
devices; clarify that in the docs of read_SCP_info(). Existing OSes without
support for memory devices will keep working as is, even when memory
devices would be attached the VM.
Guest OSs which support memory devices, such as virtio-mem, will
consult diag500(), to find out the maximum possible pfn. Guest OSes that
don't support memory devices, don't have to be changed and will continue
relying on information provided by SCLP.
There are no remaining maxram_size users in s390x code, and the remaining
ram_size users only care about initial RAM:
* hw/s390x/ipl.c
* hw/s390x/s390-hypercall.c
* hw/s390x/sclp.c
* target/s390x/kvm/pv.c
Message-ID: <20241219144115.2820241-11-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
With memory devices, we will have storage keys for memory that
exceeds the initial ram size.
The TODO already states that current handling is subopimal,
but we won't worry about improving that (TCG-only) thing for now.
Message-ID: <20241219144115.2820241-10-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
With memory devices, we will have storage attributes for memory that
exceeds the initial ram size. Further, we can easily have memory holes,
for which there (currently) are no storage attributes.
In particular, with memory holes, KVM_S390_SET_CMMA_BITS will fail to set
some storage attributes.
So let's do it like we handle storage keys migration, relying on
guest_phys_blocks_append(). However, in contrast to storage key
migration, we will handle it on the migration destination.
This is a preparation for virtio-mem support. Note that ever since the
"early migration" feature was added (x-early-migration), the state
of device blocks (plugged/unplugged) is migrated early such that
guest_phys_blocks_append() will properly consider all currently plugged
memory blocks and skip any unplugged ones.
In the future, we should try getting rid of the large temporary buffer
and also not send any attributes for any memory holes, just so they
get ignored on the destination.
Message-ID: <20241219144115.2820241-9-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
A guest OS that supports memory hotplug / memory devices must during
boot be aware of the maximum possible physical memory address that it might
have to handle at a later stage during its runtime.
For example, the maximum possible memory address might be required to
prepare the kernel virtual address space accordingly (e.g., select page
table hierarchy depth).
On s390x there is currently no such mechanism that is compatible with
paravirtualized memory devices, because the whole SCLP interface was
designed around the idea of "storage increments" and "standby memory".
Paravirtualized memory devices we want to support, such as virtio-mem, have
no intersection with any of that, but could co-exist with them in the
future if ever needed.
In particular, a guest OS must never detect and use device memory
without the help of a proper device driver. Device memory must not be
exposed in any firmware-provided memory map (SCLP or diag260 on s390x).
For this reason, these memory devices will be places in memory *above*
the "maximum storage increment" exposed via SCLP.
Let's provide a new diag500 subcode to query the memory limit determined in
s390_memory_init().
Message-ID: <20241219144115.2820241-8-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's add s390_get_memory_limit(), to query what has been successfully
set via s390_set_memory_limit(). Allow setting the limit only once.
We'll remember the limit in the machine state. Move
s390_set_memory_limit() to machine code, merging it into
set_memory_limit(), because this really is a machine property.
Message-ID: <20241219144115.2820241-7-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Nowadays, it feels more natural to have that code located in
s390_memory_init(), where we also have direct access to the machine
object.
While at it, use the actual RAM size, not the maximum RAM size which
cannot currently be reached without support for any memory devices.
Consequently update s390_pv_vm_try_disable_async() to rely on the RAM size
as well, to avoid temporary issues while we further rework that
handling.
set_memory_limit() is temporary, we'll merge it with
s390_set_memory_limit() next.
Message-ID: <20241219144115.2820241-6-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's make it clearer that we are talking about general
QEMU/KVM-specific hypercalls.
Message-ID: <20241219144115.2820241-5-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's generalize, abstracting the virtio bits. diag500 is now a generic
hypercall to handle QEMU/KVM specific things. Explicitly specify all
already defined subcodes, including legacy ones (so we know what we can
use for new hypercalls).
Move the PGM_SPECIFICATION injection into the renamed function
handle_diag_500(), so we can turn it into a void function.
We'll rename the files separately, so git properly detects the rename.
Message-ID: <20241219144115.2820241-4-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Nowadays, we only have a single machine type in QEMU, everything is based
on virtio-ccw and the traditional virtio machine does no longer exist. No
need to dynamically register diag500 handlers. Move the two existing
handlers into s390-virtio-hcall.c.
Message-ID: <20241219144115.2820241-3-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
KVM is not happy when starting a VM with weird RAM sizes:
# qemu-system-s390x --enable-kvm --nographic -m 1234K
qemu-system-s390x: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION
failed, slot=0, start=0x0, size=0x244000: Invalid argument
kvm_set_phys_mem: error registering slot: Invalid argument
Aborted (core dumped)
Let's handle that in a better way by rejecting such weird RAM sizes
right from the start:
# qemu-system-s390x --enable-kvm --nographic -m 1234K
qemu-system-s390x: ram size must be multiples of 1 MiB
Message-ID: <20241219144115.2820241-2-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
We recently converted from the LegacyReset to the new reset framework
in commit c009a311e9 ("virtio-mem: Use new Resettable framework instead
of LegacyReset") to be able to use the ResetType to filter out wakeup
resets.
However, this change had an undesired implications: as we override the
Resettable interface methods in VirtIOMEMClass, the reset handler will
not only get called during system resets (i.e., qemu_devices_reset())
but also during any direct or indirect device rests (e.g.,
device_cold_reset()).
Further, we might now receive two reset callbacks during
qemu_devices_reset(), first when reset by a parent and later when reset
directly.
The memory state of virtio-mem devices is rather special: it's supposed to
be persistent/unchanged during most resets (similar to resetting a hard
disk will not destroy the data), unless actually cold-resetting the whole
system (different to a hard disk where a reboot will not destroy the data):
ripping out system RAM is something guest OSes don't particularly enjoy,
but we want to detect when rebooting to an OS that does not support
virtio-mem and wouldn't be able to detect+use the memory -- and we want
to force-defragment hotplugged memory to also shrink the usable device
memory region. So we rally want to catch system resets to do that.
On supported targets (e.g., x86), getting a cold reset on the
device/parent triggers is not that easy (but looks like PCI code
might trigger it), so this implication went unnoticed.
However, with upcoming s390x support it is problematic: during
kdump, s390x triggers a subsystem reset, ending up in
s390_machine_reset() and calling only subsystem_reset() instead of
qemu_devices_reset() -- because it's not a full system reset.
In subsystem_reset(), s390x performs a device_cold_reset() of any
TYPE_VIRTUAL_CSS_BRIDGE device, which ends up resetting all children,
including the virtio-mem device. Consequently, we wrongly detect a system
reset and unplug all device memory, resulting in hotplugged memory not
getting included in the crash dump -- undesired.
We really must not mess with hotplugged memory state during simple
device resets. To fix, create+register a new reset object that will only
get triggered during qemu_devices_reset() calls, but not during any other
resets as it is logically not the child of any other object.
Message-ID: <20241025104103.342188-1-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Juraj Marcin <jmarcin@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
* Correct the validness check of iova
* Fix APLIC in_clrip and clripnum write emulation
* Support riscv-iommu-sys device
* Add Tenstorrent Ascalon CPU
* Add AIA userspace irqchip_split support
* Add Microblaze V generic board
* Upgrade ACPI SPCR table to support SPCR table revision 4 format
* Remove tswap64() calls from HTIF
* Support 64-bit address of initrd
* Introduce svukte ISA extension
* Support ssstateen extension
* Support for RV64 Xiangshan Nanhu CPU
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEaukCtqfKh31tZZKWr3yVEwxTgBMFAmdkzjgACgkQr3yVEwxT
gBOcyA//e0XhAQciQglCZZCfINdOyI8qSh+P2K0qtrXZ4VERHEMp7UoD5CQr2cZv
h8ij1EkatXCwukVELx0rNckxG33bEFgG1oESnQSrwGE0Iu4csNW24nK5WlUS0/r+
A5oD2wtzEF+cbhTKrVSDBN/PvlnWTKGEoJRkuXWfz5d4uR9eyQhfED0S2j36lNEC
X1x/OZoKM89XuXtOFe9g55Z5UNzAatcdTISozL0FydiPh7QeVjTLHh28/tt559MX
7v5aJFlQuZ78z1mIHkZmPSorSrJ0zqhkP6NWe1ae06oMgzwRQQhYLppDILV4ZgUF
3mSDRoXmBycQXiYNPcHep3LdXfvxr+PpWHSevx8gH1jwm93On7Y/H7Uol6TDXzfC
mrFjalfV5tzrD90ZvB+s5bCMF1q5Z8Dlj0pYF9aN9P1ILoWy3dndFAPJB6uKKDP7
Qd4qOQ3dVyHAX9jLmVkB6QvAV/vTDrYTsAxaF/EaoLOy0IoKhjTvgda3XzE1MFKA
gVafLluADIfSEdqa2QR2ExL8d1SZVoiObCp5TMLRer0HIpg/vQZwjfdbo4BgQKL3
7Q6wBxcZUNqrFgspXjm5WFIrdk2rfS/79OmvpNM6SZaK6BnklntdJHJHtAWujGsm
EM310AUFpHMp2h6Nqnemb3qr5l4d20KSt8DhoPAUq1IE59Kb8XY=
=0iQW
-----END PGP SIGNATURE-----
Merge tag 'pull-riscv-to-apply-20241220' of https://github.com/alistair23/qemu into staging
RISC-V PR for 10.0
* Correct the validness check of iova
* Fix APLIC in_clrip and clripnum write emulation
* Support riscv-iommu-sys device
* Add Tenstorrent Ascalon CPU
* Add AIA userspace irqchip_split support
* Add Microblaze V generic board
* Upgrade ACPI SPCR table to support SPCR table revision 4 format
* Remove tswap64() calls from HTIF
* Support 64-bit address of initrd
* Introduce svukte ISA extension
* Support ssstateen extension
* Support for RV64 Xiangshan Nanhu CPU
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEaukCtqfKh31tZZKWr3yVEwxTgBMFAmdkzjgACgkQr3yVEwxT
# gBOcyA//e0XhAQciQglCZZCfINdOyI8qSh+P2K0qtrXZ4VERHEMp7UoD5CQr2cZv
# h8ij1EkatXCwukVELx0rNckxG33bEFgG1oESnQSrwGE0Iu4csNW24nK5WlUS0/r+
# A5oD2wtzEF+cbhTKrVSDBN/PvlnWTKGEoJRkuXWfz5d4uR9eyQhfED0S2j36lNEC
# X1x/OZoKM89XuXtOFe9g55Z5UNzAatcdTISozL0FydiPh7QeVjTLHh28/tt559MX
# 7v5aJFlQuZ78z1mIHkZmPSorSrJ0zqhkP6NWe1ae06oMgzwRQQhYLppDILV4ZgUF
# 3mSDRoXmBycQXiYNPcHep3LdXfvxr+PpWHSevx8gH1jwm93On7Y/H7Uol6TDXzfC
# mrFjalfV5tzrD90ZvB+s5bCMF1q5Z8Dlj0pYF9aN9P1ILoWy3dndFAPJB6uKKDP7
# Qd4qOQ3dVyHAX9jLmVkB6QvAV/vTDrYTsAxaF/EaoLOy0IoKhjTvgda3XzE1MFKA
# gVafLluADIfSEdqa2QR2ExL8d1SZVoiObCp5TMLRer0HIpg/vQZwjfdbo4BgQKL3
# 7Q6wBxcZUNqrFgspXjm5WFIrdk2rfS/79OmvpNM6SZaK6BnklntdJHJHtAWujGsm
# EM310AUFpHMp2h6Nqnemb3qr5l4d20KSt8DhoPAUq1IE59Kb8XY=
# =0iQW
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 19 Dec 2024 20:54:00 EST
# gpg: using RSA key 6AE902B6A7CA877D6D659296AF7C95130C538013
# gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6AE9 02B6 A7CA 877D 6D65 9296 AF7C 9513 0C53 8013
* tag 'pull-riscv-to-apply-20241220' of https://github.com/alistair23/qemu: (39 commits)
target/riscv: add support for RV64 Xiangshan Nanhu CPU
target/riscv: add ssstateen
target/riscv/tcg: hide warn for named feats when disabling via priv_ver
target/riscv: Include missing headers in 'internals.h'
target/riscv: Include missing headers in 'vector_internals.h'
target/riscv: Check svukte is not enabled in RV32
target/riscv: Expose svukte ISA extension
target/riscv: Check memory access to meet svukte rule
target/riscv: Support hstatus[HUKTE] bit when svukte extension is enabled
target/riscv: Support senvcfg[UKTE] bit when svukte extension is enabled
target/riscv: Add svukte extension capability variable
hw/riscv: Add the checking if DTB overlaps to kernel or initrd
hw/riscv: Add a new struct RISCVBootInfo
hw/riscv: Support to load DTB after 3GB memory on 64-bit system.
hw/char/riscv_htif: Clarify MemoryRegionOps expect 32-bit accesses
hw/char/riscv_htif: Explicit little-endian implementation
MAINTAINERS: Cover RISC-V HTIF interface
tests/qtest/bios-tables-test: Update virt SPCR golden reference for RISC-V
hw/acpi: Upgrade ACPI SPCR table to support SPCR table revision 4 format
qtest: allow SPCR acpi table changes
...
Conflicts:
target/riscv/cpu.c
Merge conflict with DEFINE_PROP_END_OF_LIST() removal. No Property
array terminator is needed anymore.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* rust: second part of QOM rework
* rust: callbacks wrapper
* rust: pl011 bugfixes
* kvm: cleanup errors in kvm_convert_memory()
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmdkaEkUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroN0/wgAgIJg8BrlRKfmiz14NZfph8/jarSj
TOWYVxL2v4q98KBuL5pta2ucObgzwqyqSyc02S2DGSOIMQCIiBB5MaCk1iMjx+BO
pmVU8gNlD8faO8SSmnnr+jDQt+G+bQ/nRgQJOAReF8oVw3O2aC/FaVKpitMzWtvv
PLnJWdrqqpGq14OzX8iNCzSujxppAuyjrhT4lNlekzDoDfdTez72r+rXkvg4GzZL
QC3xLYg/LrT8Rs+zgOhm/AaIyS4bOyMlkU9Du1rQ6Tyne45ey2FCwKVzBKrJdGcw
sVbzEclxseLenoTbZqYK6JTzLdDoThVUbY2JwoCGUaIm+74P4NjEsUsTVg==
=TuQM
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* qdev: second part of Property cleanups
* rust: second part of QOM rework
* rust: callbacks wrapper
* rust: pl011 bugfixes
* kvm: cleanup errors in kvm_convert_memory()
# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmdkaEkUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroN0/wgAgIJg8BrlRKfmiz14NZfph8/jarSj
# TOWYVxL2v4q98KBuL5pta2ucObgzwqyqSyc02S2DGSOIMQCIiBB5MaCk1iMjx+BO
# pmVU8gNlD8faO8SSmnnr+jDQt+G+bQ/nRgQJOAReF8oVw3O2aC/FaVKpitMzWtvv
# PLnJWdrqqpGq14OzX8iNCzSujxppAuyjrhT4lNlekzDoDfdTez72r+rXkvg4GzZL
# QC3xLYg/LrT8Rs+zgOhm/AaIyS4bOyMlkU9Du1rQ6Tyne45ey2FCwKVzBKrJdGcw
# sVbzEclxseLenoTbZqYK6JTzLdDoThVUbY2JwoCGUaIm+74P4NjEsUsTVg==
# =TuQM
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 19 Dec 2024 13:39:05 EST
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (42 commits)
rust: pl011: simplify handling of the FIFO enabled bit in LCR
rust: pl011: fix migration stream
rust: pl011: extend registers to 32 bits
rust: pl011: fix break errors and definition of Data struct
rust: pl011: always use reset() method on registers
rust: pl011: match break logic of C version
rust: pl011: fix declaration of LineControl bits
target/i386: Reset TSCs of parked vCPUs too on VM reset
kvm: consistently return 0/-errno from kvm_convert_memory
rust: qemu-api: add a module to wrap functions and zero-sized closures
rust: qom: add initial subset of methods on Object
rust: qom: add casting functionality
rust: tests: allow writing more than one test
bql: add a "mock" BQL for Rust unit tests
rust: re-export C types from qemu-api submodules
rust: rename qemu-api modules to follow C code a bit more
rust: qom: add possibility of overriding unparent
rust: qom: put class_init together from multiple ClassInitImpl<>
Constify all opaque Property pointers
hw/core/qdev-properties: Constify Property argument to PropertyInfo.print
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
"exec/confidential-guest-support.h" is specific to system
emulation, so move it under the system/ namespace.
Mechanical change doing:
$ sed -i \
-e 's,exec/confidential-guest-support.h,sysemu/confidential-guest-support.h,' \
$(git grep -l exec/confidential-guest-support.h)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20241218155913.72288-2-philmd@linaro.org>
Headers in include/sysemu/ are not only related to system
*emulation*, they are also used by virtualization. Rename
as system/ which is clearer.
Files renamed manually then mechanical change using sed tool.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20241203172445.28576-1-philmd@linaro.org>
"system/confidential-guest-support.h" is not needed,
remove it. Reorder #ifdef'ry to reduce declarations
exposed on user emulation.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20241218155913.72288-3-philmd@linaro.org>
Always explicitly create QEMU system containers upfront.
Root containers will be created when trying to fetch the root object the
1st time. They are:
/objects
/chardevs
/backend
Machine sub-containers will be created only until machine is being
initialized. They are:
/machine/unattached
/machine/peripheral
/machine/peripheral-anon
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241121192202.4155849-8-peterx@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
QEMU will start to not rely on implicit creations of containers soon. Make
PPC drc devices follow by explicitly create the container whenever a drc
device is realized, dropping container_get() calls.
No functional change intended.
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: Harsh Prateek Bora <harshpb@linux.ibm.com>
Cc: qemu-ppc@nongnu.org
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241121192202.4155849-7-peterx@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
container_get() is going to become strict on not allowing to return a
non-container.
Switch the e500 user to use object_resolve_path_component() explicitly.
Cc: Bharat Bhushan <r65777@freescale.com>
Cc: qemu-ppc@nongnu.org
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-ID: <20241121192202.4155849-6-peterx@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Provide a macro for the container type across QEMU source tree, rather than
hard code it every time.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-ID: <20241121192202.4155849-2-peterx@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
DTB is placed to the end of memory, so we will check if the start
address of DTB overlaps to the address of kernel/initrd.
Signed-off-by: Jim Shu <jim.shu@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20241120153935.24706-4-jim.shu@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Add a new struct RISCVBootInfo to sync boot information between multiple
boot functions.
Signed-off-by: Jim Shu <jim.shu@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20241120153935.24706-3-jim.shu@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Larger initrd image will overlap the DTB at 3GB address. Since 64-bit
system doesn't have 32-bit addressable issue, we just load DTB to the end
of dram in 64-bit system.
Signed-off-by: Jim Shu <jim.shu@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20241120153935.24706-2-jim.shu@sifive.com>
[ Changes by AF
- Store fdt_load_addr_hi32 in the reset vector
]
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Looking at htif_mm_ops[] read/write handlers, we notice they
expect 32-bit values to accumulate into to the 'fromhost' and
'tohost' 64-bit variables. Explicit by setting the .impl
min/max fields.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20241129154304.34946-4-philmd@linaro.org>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Since our RISC-V system emulation is only built for little
endian, the HTIF device aims to interface with little endian
memory accesses, thus we can explicit htif_mm_ops:endianness
being DEVICE_LITTLE_ENDIAN.
In that case tswap64() is equivalent to le64_to_cpu(), as in
"convert this 64-bit little-endian value into host cpu order".
Replace to simplify.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20241129154304.34946-3-philmd@linaro.org>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Update the SPCR table to accommodate the SPCR Table revision 4 [1].
The SPCR table has been modified to adhere to the revision 4 format [2].
[1]: https://learn.microsoft.com/en-us/windows-hardware/drivers/serports/serial-port-console-redirection-table
[2]: https://github.com/acpica/acpica/pull/931
Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Message-ID: <20241028015744.624943-3-jeeheng.sia@starfivetech.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Add a basic board with interrupt controller (intc), timer, serial
(uartlite), small memory called LMB@0 (128kB) and DDR@0x80000000
(configured via command line eg. -m 2g).
This is basic configuration which matches HW generated out of AMD Vivado
(design tools). But initial configuration is going beyond what it is
configured by default because validation should be done on other
configurations too. That's why wire also additional uart16500, axi
ethernet(with axi dma).
GPIOs, i2c and qspi is also listed for completeness.
IRQ map is: (addr)
0 - timer (0x41c00000)
1 - uartlite (0x40600000)
2 - i2c (0x40800000)
3 - qspi (0x44a00000)
4 - uart16550 (0x44a10000)
5 - emaclite (0x40e00000)
6 - timer2 (0x41c10000)
7 - axi emac (0x40c00000)
8 - axi dma (0x41e00000)
9 - axi dma
10 - gpio (0x40000000)
11 - gpio2 (0x40010000)
12 - gpio3 (0x40020000)
Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@amd.com>
Signed-off-by: Michal Simek <michal.simek@amd.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241125134739.18189-1-sai.pavan.boddu@amd.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
The last step to enable KVM AIA aplic-imsic with irqchip in split mode
is to deal with how MSIs are going to be sent. In our current design we
don't allow an APLIC controller to send MSIs unless it's on m-mode. And
we also do not allow Supervisor MSI address configuration via the
'smsiaddrcfg' and 'smsiaddrcfgh' registers unless it's also a m-mode
APLIC controller.
Add a new RISCVACPLICState attribute called 'kvm_msicfgaddr'. This
attribute represents the base configuration address for MSIs, in our
case the base addr of the IMSIC controller. This attribute is being set
only when running irqchip_split() mode with aia=aplic-imsic.
During riscv_aplic_msi_send() we'll check if the attribute was set to
skip the check for a m-mode APLIC controller and to change the resulting
MSI addr by adding kvm_msicfgaddr right before address_space_stl_le().
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241119191706.718860-7-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
The current logic to determine if we don't need an emulated APLIC
controller, i.e. KVM will provide for us, is to determine if we're
running KVM, with in-kernel irqchip support, and running
aia=aplic-imsic. This is modelled by riscv_is_kvm_aia_aplic_imsic() and
virt_use_kvm_aia_aplic_imsic().
This won't suffice to support irqchip_split() mode: it will match
exactly the same conditions as the one above, but setting the irqchip to
'split' mode will now require us to emulate an APLIC s-mode controller,
like we're doing with 'aia=aplic'.
Create a new riscv_use_emulated_aplic() helper that will encapsulate
this logic. Replace the uses of "riscv_is_kvm_aia_aplic_imsic()" with
this helper every time we're taking a decision on emulate an APLIC
controller or not. Do the same in virt.c with virt_use_emulated_aplic().
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241119191706.718860-6-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Similar to the riscv_is_kvm_aia_aplic_imsic() helper from riscv_aplic.c,
the existing virt_use_kvm_aia() is testing for KVM aia=aplic-imsic with
in-kernel irqchip enabled. It is not checking for a generic AIA support.
Rename the helper to virt_use_kvm_aia_aplic_imsic() to reflect what the
helper is doing, and use the existing riscv_is_kvm_aia_aplic_imsic() to
obscure details such as the presence of the in-kernel irqchip.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241119191706.718860-4-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
In create_fdt_sockets() we have the following pattern:
if (kvm_enabled() && virt_use_kvm_aia(s)) {
(... do stuff ...)
} else {
(... do other stuff ...)
}
if (kvm_enabled() && virt_use_kvm_aia(s)) {
(... do more stuff ...)
} else {
(... do more other stuff)
}
Do everything in a single if/else clause to reduce the usage of
virt_use_kvm_aia() helper and to make the code a bit less repetitive.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241119191706.718860-3-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
The helper is_kvm_aia() is checking not only for AIA, but for
aplic-imsic (i.e. "aia=aplic-imsic" in 'virt' RISC-V machine) with an
in-kernel chip present.
Rename it to be a bit clear what the helper is doing since we'll add
more AIA helpers in the next patches.
Make the helper public because the 'virt' machine will use it as well.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241119191706.718860-2-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Add a riscv_iommu_reset() helper in the base emulation code that
implements the expected reset behavior as defined by the riscv-iommu
spec.
Devices can then use this helper in their own reset callbacks.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241106133407.604587-7-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
MSIx support is added in the RISC-V IOMMU platform device by including
the required MSIx facilities to alow software to properly setup the MSIx
subsystem.
We took inspiration of what is being done in the riscv-iommu-pci device,
mainly msix_init() and msix_notify(), while keeping in mind that
riscv-iommu-sys isn't a true PCI device and we don't need to copy/paste
all the contents of these MSIx functions.
Two extra MSI MemoryRegions were added: 'msix-table' and 'msix-pba'.
They are used to manage r/w of the MSI table and Pending Bit Array (PBA)
respectively. Both are subregions of the main IOMMU memory region,
iommu->regs_mr, initialized during riscv_iommu_realize(), and each one
has their own handlers for MSIx reads and writes.
This is the expected memory map when using this device in the 'virt'
machine:
0000000003010000-0000000003010fff (prio 0, i/o): riscv-iommu-regs
0000000003010300-000000000301034f (prio 0, i/o): msix-table
0000000003010400-0000000003010407 (prio 0, i/o): msix-pba
We're now able to set IGS to RISCV_IOMMU_CAP_IGS_BOTH, and userspace is
free to decide which interrupt model to use.
Enabling MSIx support for this device in the 'virt' machine requires
adding 'msi-parent' in the iommu-sys DT.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241106133407.604587-6-dbarboza@ventanamicro.com>
[ Changes by AF:
- Used PRIx64 in trace
]
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Add a new machine option called 'iommu-sys' that enables a
riscv-iommu-sys platform device for the 'virt' machine. The option is
default 'off'.
The device will use IRQs 36 to 39.
We will not support both riscv-iommu-sys and riscv-iommu-pci devices in
the same board in this first implementation. If a riscv-iommu-pci device
is added in the command line we will disable the riscv-iommu-sys device.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241106133407.604587-5-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
This device models the RISC-V IOMMU as a sysbus device. The same design
decisions taken in the riscv-iommu-pci device were kept, namely the
existence of 4 vectors are available for each interrupt cause.
The WSIs are emitted using the input of the s->notify() callback as a
index to an IRQ list. The IRQ list starts at 'base_irq' and goes until
base_irq + 3. This means that boards must have 4 contiguous IRQ lines
available, starting from 'base_irq'.
Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241106133407.604587-4-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Interrupt Generation Support (IGS) is a capability that is tied to the
interrupt deliver mechanism, not with the core IOMMU emulation. We
should allow device implementations to set IGS as they wish.
A new helper is added to make it easier for device impls to set IGS. Use
it in our existing IOMMU device (riscv-iommu-pci) to set
RISCV_IOMMU_CAPS_IGS_MSI.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241106133407.604587-3-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Move all the static initializion of the device to an init() function,
leaving only the dynamic initialization to be done during realize.
With this change s->cap is initialized with RISCV_IOMMU_CAP_DBG during
init(), and realize() will increment s->cap with the extra caps.
This will allow callers to add IOMMU capabilities before the
realization.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241106133407.604587-2-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
In the section "4.7 Precise effects on interrupt-pending bits"
of the RISC-V AIA specification defines that:
"If the source mode is Level1 or Level0 and the interrupt domain
is configured in MSI delivery mode (domaincfg.DM = 1):
The pending bit is cleared whenever the rectified input value is
low, when the interrupt is forwarded by MSI, or by a relevant
write to an in_clrip register or to clripnum."
Update the riscv_aplic_set_pending() to match the spec.
Fixes: bf31cf06eb ("hw/intc/riscv_aplic: Fix setipnum_le write emulation for APLIC MSI-mode")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241029085349.30412-1-yongxuan.wang@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
From RISCV IOMMU spec section 2.1.3:
When SXL is 1, the following rules apply:
- If the first-stage is not Bare, then a page fault corresponding to the
original access type occurs if the IOVA has bits beyond bit 31 set to 1.
- If the second-stage is not Bare, then a guest page fault corresponding
to the original access type occurs if the incoming GPA has bits beyond bit
33 set to 1.
From RISCV IOMMU spec section 2.3 step 17:
Use the process specified in Section "Two-Stage Address Translation" of
the RISC-V Privileged specification to determine the GPA accessed by the
transaction.
From RISCV IOMMU spec section 2.3 step 19:
Use the second-stage address translation process specified in Section
"Two-Stage Address Translation" of the RISC-V Privileged specification
to translate the GPA A to determine the SPA accessed by the transaction.
This commit adds the iova check with the following rules:
- For Sv32, Sv32x4, Sv39x4, Sv48x4 and Sv57x4, the iova must be zero
extended.
- For Sv39, Sv48 and Sv57, the iova must be signed extended with most
significant bit.
Signed-off-by: Jason Chien <jason.chien@sifive.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20241114065617.25133-1-jason.chien@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Via sed "s/ Property [*]/ const Property */".
The opaque pointers passed to ObjectProperty callbacks are
the last instances of non-const Property pointers in the tree.
For the most part, these callbacks only use object_field_prop_ptr,
which now takes a const pointer itself.
This logically should have accompanied d36f165d95 which
allowed const Property to be registered.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://lore.kernel.org/r/20241218134251.4724-25-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This logically should have accompanied d36f165d95 which
allowed const Property to be registered.
There is exactly one instance of this method: print_pci_devfn.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://lore.kernel.org/r/20241218134251.4724-24-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This logically should have accompanied d36f165d95 which
allowed const Property to be registered.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://lore.kernel.org/r/20241218134251.4724-23-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that all of the Property arrays are counted, we can remove
the terminator object from each array. Update the assertions
in device_class_set_props to match.
With struct Property being 88 bytes, this was a rather large
form of terminator. Saves 30k from qemu-system-aarch64.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://lore.kernel.org/r/20241218134251.4724-21-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>