target-arm queue:

* linux-user/elfload: Add missing arm64 hwcap values
  * stellaris-gamepad: Convert to qdev
  * docs/specs: Convert various txt docs to rST
  * MAINTAINERS: Make sure that gicv3_internal.h is covered, too
  * hw/arm/pxa2xx_gpio: Pass CPU using QOM link property
  * hw/watchdog/wdt_imx2: Trace MMIO access and timer activity
  * hw/misc/imx7_snvs: Trace MMIO access
  * hw/misc/imx6_ccm: Convert DPRINTF to trace events
  * hw/i2c/pm_smbus: Convert DPRINTF to trace events
  * target/arm: Enable FEAT_MOPS insns in user-mode emulation
  * linux-user: Report AArch64 hwcap2 fields above bit 31
  * target/arm: Make FEAT_MOPS SET* insns handle Xs == XZR correctly
  * target/arm: Fix SVE STR increment
  * hw/char/stm32f2xx_usart: implement TX interrupts
  * target/arm: Correctly propagate stage 1 BTI guarded bit in a two-stage walk
  * xlnx-versal-virt: Add AMD/Xilinx TRNG device
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmVD3hEZHHBldGVyLm1h
 eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3kuRD/4mLL2DB+yvQJrzSvUlrjfi
 /orPDrY9xEQ7ln2YpNqc2BZ4wAgh947yk/ae5+lyACQcBhCPiwMyVK1bBscNxkgA
 8YPmuugNem/64+IHiKkz6aroqjvC83dUzJ9R5O9ctV70mgrX32YnhXNkkYVI81Ar
 bEwBznyYeCiy8ZafVxc2m70fiBOlurb6htYYdt7VHsgB0ozK/80UmuFI6exOKt1r
 oVyYouMaidNV/AoqZBGSKT2UFvFmI57PWN0YQD8CMECLsB/mBE9TEzSvLRdlOB4G
 qI5hgEJks61qDL6+YMJ+hskxW+D3g3I1WjuyqhKfiAzcKmmTAp1NsiiDtva8yBzX
 lDUXc6bPomalrKo1SPsooJv9r4uE3hCayDOlR+qM38DL4j2soSd3QIP7dCzERbZx
 snrD+ZTtgXtomUN8ojbnOK+kClEfURZ+wALbUEXwAh1sBwrKBxaD4ss4lA2esq10
 HJPjBJzAWoSmK2DY6GWt2xIa+GvQwdPnxMpHbp3yAddGP7i/lHM0x60q5YpjHV++
 DHaZmLBA7L9wcvT1VrwmieJaB+ADcSfkzBz2KznC4usdEY8BiJhjdRAzkqdGZWV5
 HKEg8QwMYHg4QRUoZxW/XdtVzdqcjO5pTSUr3HUE+85sum2e9Yee6rybg1W/EWYv
 7SnVkD5zG1BU268/p5k6UA==
 =OgfH
 -----END PGP SIGNATURE-----

Merge tag 'pull-target-arm-20231102' of https://git.linaro.org/people/pmaydell/qemu-arm into staging

target-arm queue:
 * linux-user/elfload: Add missing arm64 hwcap values
 * stellaris-gamepad: Convert to qdev
 * docs/specs: Convert various txt docs to rST
 * MAINTAINERS: Make sure that gicv3_internal.h is covered, too
 * hw/arm/pxa2xx_gpio: Pass CPU using QOM link property
 * hw/watchdog/wdt_imx2: Trace MMIO access and timer activity
 * hw/misc/imx7_snvs: Trace MMIO access
 * hw/misc/imx6_ccm: Convert DPRINTF to trace events
 * hw/i2c/pm_smbus: Convert DPRINTF to trace events
 * target/arm: Enable FEAT_MOPS insns in user-mode emulation
 * linux-user: Report AArch64 hwcap2 fields above bit 31
 * target/arm: Make FEAT_MOPS SET* insns handle Xs == XZR correctly
 * target/arm: Fix SVE STR increment
 * hw/char/stm32f2xx_usart: implement TX interrupts
 * target/arm: Correctly propagate stage 1 BTI guarded bit in a two-stage walk
 * xlnx-versal-virt: Add AMD/Xilinx TRNG device

* tag 'pull-target-arm-20231102' of https://git.linaro.org/people/pmaydell/qemu-arm: (33 commits)
  tests/qtest: Introduce tests for AMD/Xilinx Versal TRNG device
  hw/arm: xlnx-versal-virt: Add AMD/Xilinx TRNG device
  hw/misc: Introduce AMD/Xilix Versal TRNG device
  target/arm: Correctly propagate stage 1 BTI guarded bit in a two-stage walk
  hw/char/stm32f2xx_usart: Add more definitions for CR1 register
  hw/char/stm32f2xx_usart: Update IRQ when DR is written
  hw/char/stm32f2xx_usart: Extract common IRQ update code to update_irq()
  target/arm: Fix SVE STR increment
  target/arm: Make FEAT_MOPS SET* insns handle Xs == XZR correctly
  linux-user: Report AArch64 hwcap2 fields above bit 31
  target/arm: Enable FEAT_MOPS insns in user-mode emulation
  hw/i2c/pm_smbus: Convert DPRINTF to trace events
  hw/misc/imx6_ccm: Convert DPRINTF to trace events
  hw/misc/imx7_snvs: Trace MMIO access
  hw/watchdog/wdt_imx2: Trace timer activity
  hw/watchdog/wdt_imx2: Trace MMIO access
  hw/arm/pxa2xx_gpio: Pass CPU using QOM link property
  MAINTAINERS: Make sure that gicv3_internal.h is covered, too
  docs/specs/vmgenid: Convert to rST
  docs/specs/vmcoreinfo: Convert to rST
  ...

Conflicts:
  hw/input/stellaris_input.c
  The qdev conversion in this pull request ("stellaris-gamepad: Convert
  to qdev") eliminates the vmstate_register() call that was converted to
  vmstate_register_any() in the conflicting migration pull request.
  vmstate_register_any() is no longer necessary now that this device has
  been converted to qdev, so take this pull request's version of
  stellaris_gamepad.c over the previous pull request's
  stellaris_input.c (the file was renamed).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This commit is contained in:
Stefan Hajnoczi 2023-11-03 10:04:12 +08:00
commit d762bf9793
56 changed files with 2302 additions and 776 deletions

View file

@ -2,9 +2,10 @@
EDU device
==========
Copyright (c) 2014-2015 Jiri Slaby
..
Copyright (c) 2014-2015 Jiri Slaby
This document is licensed under the GPLv2 (or later).
This document is licensed under the GPLv2 (or later).
This is an educational device for writing (kernel) drivers. Its original
intention was to support the Linux kernel lectures taught at the Masaryk
@ -15,10 +16,11 @@ The devices behaves very similar to the PCI bridge present in the COMBO6 cards
developed under the Liberouter wings. Both PCI device ID and PCI space is
inherited from that device.
Command line switches:
-device edu[,dma_mask=mask]
Command line switches
---------------------
dma_mask makes the virtual device work with DMA addresses with the given
``-device edu[,dma_mask=mask]``
``dma_mask`` makes the virtual device work with DMA addresses with the given
mask. For educational purposes, the device supports only 28 bits (256 MiB)
by default. Students shall set dma_mask for the device in the OS driver
properly.
@ -26,7 +28,8 @@ Command line switches:
PCI specs
---------
PCI ID: 1234:11e8
PCI ID:
``1234:11e8``
PCI Region 0:
I/O memory, 1 MB in size. Users are supposed to communicate with the card
@ -35,24 +38,29 @@ PCI Region 0:
MMIO area spec
--------------
Only size == 4 accesses are allowed for addresses < 0x80. size == 4 or
size == 8 for the rest.
Only ``size == 4`` accesses are allowed for addresses ``< 0x80``.
``size == 4`` or ``size == 8`` for the rest.
0x00 (RO) : identification (0xRRrr00edu)
RR -- major version
rr -- minor version
0x00 (RO) : identification
Value is in the form ``0xRRrr00edu`` where:
- ``RR`` -- major version
- ``rr`` -- minor version
0x04 (RW) : card liveness check
It is a simple value inversion (~ C operator).
It is a simple value inversion (``~`` C operator).
0x08 (RW) : factorial computation
The stored value is taken and factorial of it is put back here.
This happens only after factorial bit in the status register (0x20
below) is cleared.
0x20 (RW) : status register, bitwise OR
0x01 -- computing factorial (RO)
0x80 -- raise interrupt after finishing factorial computation
0x20 (RW) : status register
Bitwise OR of:
0x01
computing factorial (RO)
0x80
raise interrupt after finishing factorial computation
0x24 (RO) : interrupt status register
It contains values which raised the interrupt (see interrupt raise
@ -76,13 +84,19 @@ size == 8 for the rest.
0x90 (RW) : DMA transfer count
The size of the area to perform the DMA on.
0x98 (RW) : DMA command register, bitwise OR
0x01 -- start transfer
0x02 -- direction (0: from RAM to EDU, 1: from EDU to RAM)
0x04 -- raise interrupt 0x100 after finishing the DMA
0x98 (RW) : DMA command register
Bitwise OR of:
0x01
start transfer
0x02
direction (0: from RAM to EDU, 1: from EDU to RAM)
0x04
raise interrupt 0x100 after finishing the DMA
IRQ controller
--------------
An IRQ is generated when written to the interrupt raise register. The value
appears in interrupt status register when the interrupt is raised and has to
be written to the interrupt acknowledge register to lower it.
@ -94,22 +108,28 @@ routine.
DMA controller
--------------
One has to specify, source, destination, size, and start the transfer. One
4096 bytes long buffer at offset 0x40000 is available in the EDU device. I.e.
one can perform DMA to/from this space when programmed properly.
Example of transferring a 100 byte block to and from the buffer using a given
PCI address 'addr':
addr -> DMA source address
0x40000 -> DMA destination address
100 -> DMA transfer count
1 -> DMA command register
while (DMA command register & 1)
;
PCI address ``addr``:
0x40000 -> DMA source address
addr+100 -> DMA destination address
100 -> DMA transfer count
3 -> DMA command register
while (DMA command register & 1)
;
::
addr -> DMA source address
0x40000 -> DMA destination address
100 -> DMA transfer count
1 -> DMA command register
while (DMA command register & 1)
;
::
0x40000 -> DMA source address
addr+100 -> DMA destination address
100 -> DMA transfer count
3 -> DMA command register
while (DMA command register & 1)
;

View file

@ -24,3 +24,11 @@ guest hardware that is specific to QEMU.
acpi_erst
sev-guest-firmware
fw_cfg
vmw_pvscsi-spec
edu
ivshmem-spec
pvpanic
standard-vga
virt-ctlr
vmcoreinfo
vmgenid

View file

@ -1,4 +1,6 @@
= Device Specification for Inter-VM shared memory device =
======================================================
Device Specification for Inter-VM shared memory device
======================================================
The Inter-VM shared memory device (ivshmem) is designed to share a
memory region between multiple QEMU processes running different guests
@ -12,42 +14,17 @@ can obtain one from an ivshmem server.
In the latter case, the device can additionally interrupt its peers, and
get interrupted by its peers.
For information on configuring the ivshmem device on the QEMU
command line, see :doc:`../system/devices/ivshmem`.
== Configuring the ivshmem PCI device ==
There are two basic configurations:
- Just shared memory:
-device ivshmem-plain,memdev=HMB,...
This uses host memory backend HMB. It should have option "share"
set.
- Shared memory plus interrupts:
-device ivshmem-doorbell,chardev=CHR,vectors=N,...
An ivshmem server must already be running on the host. The device
connects to the server's UNIX domain socket via character device
CHR.
Each peer gets assigned a unique ID by the server. IDs must be
between 0 and 65535.
Interrupts are message-signaled (MSI-X). vectors=N configures the
number of vectors to use.
For more details on ivshmem device properties, see the QEMU Emulator
user documentation.
== The ivshmem PCI device's guest interface ==
The ivshmem PCI device's guest interface
========================================
The device has vendor ID 1af4, device ID 1110, revision 1. Before
QEMU 2.6.0, it had revision 0.
=== PCI BARs ===
PCI BARs
--------
The ivshmem PCI device has two or three BARs:
@ -59,8 +36,7 @@ There are two ways to use this device:
- If you only need the shared memory part, BAR2 suffices. This way,
you have access to the shared memory in the guest and can use it as
you see fit. Memnic, for example, uses ivshmem this way from guest
user space (see http://dpdk.org/browse/memnic).
you see fit.
- If you additionally need the capability for peers to interrupt each
other, you need BAR0 and BAR1. You will most likely want to write a
@ -77,10 +53,13 @@ accessing BAR2.
Revision 0 of the device is not capable to tell guest software whether
it is configured for interrupts.
=== PCI device registers ===
PCI device registers
--------------------
BAR 0 contains the following registers:
::
Offset Size Access On reset Function
0 4 read/write 0 Interrupt Mask
bit 0: peer interrupt (rev 0)
@ -145,18 +124,20 @@ With multiple MSI-X vectors, different vectors can be used to indicate
different events have occurred. The semantics of interrupt vectors
are left to the application.
== Interrupt infrastructure ==
Interrupt infrastructure
========================
When configured for interrupts, the peers share eventfd objects in
addition to shared memory. The shared resources are managed by an
ivshmem server.
=== The ivshmem server ===
The ivshmem server
------------------
The server listens on a UNIX domain socket.
For each new client that connects to the server, the server
- picks an ID,
- creates eventfd file descriptors for the interrupt vectors,
- sends the ID and the file descriptor for the shared memory to the
@ -189,7 +170,8 @@ vectors.
A standalone client is in contrib/ivshmem-client/. It can be useful
for debugging.
=== The ivshmem Client-Server Protocol ===
The ivshmem Client-Server Protocol
----------------------------------
An ivshmem device configured for interrupts connects to an ivshmem
server. This section details the protocol between the two.
@ -245,7 +227,8 @@ Known bugs:
* The protocol is poorly designed.
=== The ivshmem Client-Client Protocol ===
The ivshmem Client-Client Protocol
----------------------------------
An ivshmem device configured for interrupts receives eventfd file
descriptors for interrupting peers and getting interrupted by peers

View file

@ -50,7 +50,7 @@ maintained as part of the virtio specification.
by QEMU.
1af4:1110
ivshmem device (shared memory, ``docs/specs/ivshmem-spec.txt``)
ivshmem device (:doc:`ivshmem-spec`)
All other device IDs are reserved.

View file

@ -21,18 +21,21 @@ recognize. On write, the bits not recognized by the device are ignored.
Software should set only bits both itself and the device recognize.
Bit Definition
--------------
bit 0: a guest panic has happened and should be processed by the host
bit 1: a guest panic has happened and will be handled by the guest;
the host should record it or report it, but should not affect
the execution of the guest.
~~~~~~~~~~~~~~
bit 0
a guest panic has happened and should be processed by the host
bit 1
a guest panic has happened and will be handled by the guest;
the host should record it or report it, but should not affect
the execution of the guest.
PCI Interface
-------------
The PCI interface is similar to the ISA interface except that it uses an MMIO
address space provided by its BAR0, 1 byte long. Any machine with a PCI bus
can enable a pvpanic device by adding '-device pvpanic-pci' to the command
can enable a pvpanic device by adding ``-device pvpanic-pci`` to the command
line.
ACPI Interface
@ -40,15 +43,25 @@ ACPI Interface
pvpanic device is defined with ACPI ID "QEMU0001". Custom methods:
RDPT: To determine whether guest panic notification is supported.
Arguments: None
Return: Returns a byte, with the same semantics as the I/O port
interface.
RDPT
~~~~
WRPT: To send a guest panic event
Arguments: Arg0 is a byte to be written, with the same semantics as
the I/O interface.
Return: None
To determine whether guest panic notification is supported.
Arguments
None
Return
Returns a byte, with the same semantics as the I/O port interface.
WRPT
~~~~
To send a guest panic event.
Arguments
Arg0 is a byte to be written, with the same semantics as the I/O interface.
Return
None
The ACPI device will automatically refer to the right port in case it
is modified.

View file

@ -0,0 +1,94 @@
QEMU Standard VGA
=================
Exists in two variants, for isa and pci.
command line switches:
``-vga std``
picks isa for -M isapc, otherwise pci
``-device VGA``
pci variant
``-device isa-vga``
isa variant
``-device secondary-vga``
legacy-free pci variant
PCI spec
--------
Applies to the pci variant only for obvious reasons.
PCI ID
``1234:1111``
PCI Region 0
Framebuffer memory, 16 MB in size (by default).
Size is tunable via vga_mem_mb property.
PCI Region 1
Reserved (so we have the option to make the framebuffer bar 64bit).
PCI Region 2
MMIO bar, 4096 bytes in size (QEMU 1.3+)
PCI ROM Region
Holds the vgabios (QEMU 0.14+).
The legacy-free variant has no ROM and has ``PCI_CLASS_DISPLAY_OTHER``
instead of ``PCI_CLASS_DISPLAY_VGA``.
IO ports used
-------------
Doesn't apply to the legacy-free pci variant, use the MMIO bar instead.
``03c0 - 03df``
standard vga ports
``01ce``
bochs vbe interface index port
``01cf``
bochs vbe interface data port (x86 only)
``01d0``
bochs vbe interface data port
Memory regions used
-------------------
``0xe0000000``
Framebuffer memory, isa variant only.
The pci variant used to mirror the framebuffer bar here, QEMU 0.14+
stops doing that (except when in ``-M pc-$old`` compat mode).
MMIO area spec
--------------
Likewise applies to the pci variant only for obvious reasons.
``0000 - 03ff``
edid data blob.
``0400 - 041f``
vga ioports (``0x3c0`` to ``0x3df``), remapped 1:1. Word access
is supported, bytes are written in little endian order (aka index
port first), so indexed registers can be updated with a single
mmio write (and thus only one vmexit).
``0500 - 0515``
bochs dispi interface registers, mapped flat without index/data ports.
Use ``(index << 1)`` as offset for (16bit) register access.
``0600 - 0607``
QEMU extended registers. QEMU 2.2+ only.
The pci revision is 2 (or greater) when these registers are present.
The registers are 32bit.
``0600``
QEMU extended register region size, in bytes.
``0604``
framebuffer endianness register.
- ``0xbebebebe`` indicates big endian.
- ``0x1e1e1e1e`` indicates little endian.

View file

@ -1,81 +0,0 @@
QEMU Standard VGA
=================
Exists in two variants, for isa and pci.
command line switches:
-vga std [ picks isa for -M isapc, otherwise pci ]
-device VGA [ pci variant ]
-device isa-vga [ isa variant ]
-device secondary-vga [ legacy-free pci variant ]
PCI spec
--------
Applies to the pci variant only for obvious reasons.
PCI ID: 1234:1111
PCI Region 0:
Framebuffer memory, 16 MB in size (by default).
Size is tunable via vga_mem_mb property.
PCI Region 1:
Reserved (so we have the option to make the framebuffer bar 64bit).
PCI Region 2:
MMIO bar, 4096 bytes in size (qemu 1.3+)
PCI ROM Region:
Holds the vgabios (qemu 0.14+).
The legacy-free variant has no ROM and has PCI_CLASS_DISPLAY_OTHER
instead of PCI_CLASS_DISPLAY_VGA.
IO ports used
-------------
Doesn't apply to the legacy-free pci variant, use the MMIO bar instead.
03c0 - 03df : standard vga ports
01ce : bochs vbe interface index port
01cf : bochs vbe interface data port (x86 only)
01d0 : bochs vbe interface data port
Memory regions used
-------------------
0xe0000000 : Framebuffer memory, isa variant only.
The pci variant used to mirror the framebuffer bar here, qemu 0.14+
stops doing that (except when in -M pc-$old compat mode).
MMIO area spec
--------------
Likewise applies to the pci variant only for obvious reasons.
0000 - 03ff : edid data blob.
0400 - 041f : vga ioports (0x3c0 -> 0x3df), remapped 1:1.
word access is supported, bytes are written
in little endia order (aka index port first),
so indexed registers can be updated with a
single mmio write (and thus only one vmexit).
0500 - 0515 : bochs dispi interface registers, mapped flat
without index/data ports. Use (index << 1)
as offset for (16bit) register access.
0600 - 0607 : qemu extended registers. qemu 2.2+ only.
The pci revision is 2 (or greater) when
these registers are present. The registers
are 32bit.
0600 : qemu extended register region size, in bytes.
0604 : framebuffer endianness register.
- 0xbebebebe indicates big endian.
- 0x1e1e1e1e indicates little endian.

View file

@ -1,9 +1,9 @@
Virtual System Controller
=========================
This device is a simple interface defined for the pure virtual machine with no
hardware reference implementation to allow the guest kernel to send command
to the host hypervisor.
The ``virt-ctrl`` device is a simple interface defined for the pure
virtual machine with no hardware reference implementation to allow the
guest kernel to send command to the host hypervisor.
The specification can evolve, the current state is defined as below.
@ -11,14 +11,12 @@ This is a MMIO mapped device using 256 bytes.
Two 32bit registers are defined:
1- the features register (read-only, address 0x00)
the features register (read-only, address 0x00)
This register allows the device to report features supported by the
controller.
The only feature supported for the moment is power control (0x01).
2- the command register (write-only, address 0x04)
the command register (write-only, address 0x04)
This register allows the kernel to send the commands to the hypervisor.
The implemented commands are part of the power control feature and
are reset (1), halt (2) and panic (3).

54
docs/specs/vmcoreinfo.rst Normal file
View file

@ -0,0 +1,54 @@
=================
VMCoreInfo device
=================
The ``-device vmcoreinfo`` will create a ``fw_cfg`` entry for a guest to
store dump details.
``etc/vmcoreinfo``
==================
A guest may use this ``fw_cfg`` entry to add information details to QEMU
dumps.
The entry of 16 bytes has the following layout, in little-endian::
#define VMCOREINFO_FORMAT_NONE 0x0
#define VMCOREINFO_FORMAT_ELF 0x1
struct FWCfgVMCoreInfo {
uint16_t host_format; /* formats host supports */
uint16_t guest_format; /* format guest supplies */
uint32_t size; /* size of vmcoreinfo region */
uint64_t paddr; /* physical address of vmcoreinfo region */
};
Only full write (of 16 bytes) are considered valid for further
processing of entry values.
A write of 0 in ``guest_format`` will disable further processing of
vmcoreinfo entry values & content.
You may write a ``guest_format`` that is not supported by the host, in
which case the entry data can be ignored by QEMU (but you may still
access it through a debugger, via ``vmcoreinfo_realize::vmcoreinfo_state``).
Format & content
================
As of QEMU 2.11, only ``VMCOREINFO_FORMAT_ELF`` is supported.
The entry gives location and size of an ELF note that is appended in
qemu dumps.
The note format/class must be of the target bitness and the size must
be less than 1Mb.
If the ELF note name is ``VMCOREINFO``, it is expected to be the Linux
vmcoreinfo note (see `the kernel documentation for its format
<https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-kernel-vmcoreinfo>`_).
In this case, qemu dump code will read the content
as a key=value text file, looking for ``NUMBER(phys_base)`` key
value. The value is expected to be more accurate than architecture
guess of the value. This is useful for KASLR-enabled guest with
ancient tools not handling the ``VMCOREINFO`` note.

View file

@ -1,53 +0,0 @@
=================
VMCoreInfo device
=================
The `-device vmcoreinfo` will create a fw_cfg entry for a guest to
store dump details.
etc/vmcoreinfo
**************
A guest may use this fw_cfg entry to add information details to qemu
dumps.
The entry of 16 bytes has the following layout, in little-endian::
#define VMCOREINFO_FORMAT_NONE 0x0
#define VMCOREINFO_FORMAT_ELF 0x1
struct FWCfgVMCoreInfo {
uint16_t host_format; /* formats host supports */
uint16_t guest_format; /* format guest supplies */
uint32_t size; /* size of vmcoreinfo region */
uint64_t paddr; /* physical address of vmcoreinfo region */
};
Only full write (of 16 bytes) are considered valid for further
processing of entry values.
A write of 0 in guest_format will disable further processing of
vmcoreinfo entry values & content.
You may write a guest_format that is not supported by the host, in
which case the entry data can be ignored by qemu (but you may still
access it through a debugger, via vmcoreinfo_realize::vmcoreinfo_state).
Format & content
****************
As of qemu 2.11, only VMCOREINFO_FORMAT_ELF is supported.
The entry gives location and size of an ELF note that is appended in
qemu dumps.
The note format/class must be of the target bitness and the size must
be less than 1Mb.
If the ELF note name is "VMCOREINFO", it is expected to be the Linux
vmcoreinfo note (see Documentation/ABI/testing/sysfs-kernel-vmcoreinfo
in Linux source). In this case, qemu dump code will read the content
as a key=value text file, looking for "NUMBER(phys_base)" key
value. The value is expected to be more accurate than architecture
guess of the value. This is useful for KASLR-enabled guest with
ancient tools not handling the VMCOREINFO note.

246
docs/specs/vmgenid.rst Normal file
View file

@ -0,0 +1,246 @@
Virtual Machine Generation ID Device
====================================
..
Copyright (C) 2016 Red Hat, Inc.
Copyright (C) 2017 Skyport Systems, Inc.
This work is licensed under the terms of the GNU GPL, version 2 or later.
See the COPYING file in the top-level directory.
The VM generation ID (``vmgenid``) device is an emulated device which
exposes a 128-bit, cryptographically random, integer value identifier,
referred to as a Globally Unique Identifier, or GUID.
This allows management applications (e.g. libvirt) to notify the guest
operating system when the virtual machine is executed with a different
configuration (e.g. snapshot execution or creation from a template). The
guest operating system notices the change, and is then able to react as
appropriate by marking its copies of distributed databases as dirty,
re-initializing its random number generator etc.
Requirements
------------
These requirements are extracted from the "How to implement virtual machine
generation ID support in a virtualization platform" section of
`the Microsoft Virtual Machine Generation ID specification
<http://go.microsoft.com/fwlink/?LinkId=260709>`_ dated August 1, 2012.
- **R1a** The generation ID shall live in an 8-byte aligned buffer.
- **R1b** The buffer holding the generation ID shall be in guest RAM,
ROM, or device MMIO range.
- **R1c** The buffer holding the generation ID shall be kept separate from
areas used by the operating system.
- **R1d** The buffer shall not be covered by an AddressRangeMemory or
AddressRangeACPI entry in the E820 or UEFI memory map.
- **R1e** The generation ID shall not live in a page frame that could be
mapped with caching disabled. (In other words, regardless of whether the
generation ID lives in RAM, ROM or MMIO, it shall only be mapped as
cacheable.)
- **R2** to **R5** [These AML requirements are isolated well enough in the
Microsoft specification for us to simply refer to them here.]
- **R6** The hypervisor shall expose a _HID (hardware identifier) object
in the VMGenId device's scope that is unique to the hypervisor vendor.
QEMU Implementation
-------------------
The above-mentioned specification does not dictate which ACPI descriptor table
will contain the VM Generation ID device. Other implementations (Hyper-V and
Xen) put it in the main descriptor table (Differentiated System Description
Table or DSDT). For ease of debugging and implementation, we have decided to
put it in its own Secondary System Description Table, or SSDT.
The following is a dump of the contents from a running system::
# iasl -p ./SSDT -d /sys/firmware/acpi/tables/SSDT
Intel ACPI Component Architecture
ASL+ Optimizing Compiler version 20150717-64
Copyright (c) 2000 - 2015 Intel Corporation
Reading ACPI table from file /sys/firmware/acpi/tables/SSDT - Length
00000198 (0x0000C6)
ACPI: SSDT 0x0000000000000000 0000C6 (v01 BOCHS VMGENID 00000001 BXPC 00000001)
Acpi table [SSDT] successfully installed and loaded
Pass 1 parse of [SSDT]
Pass 2 parse of [SSDT]
Parsing Deferred Opcodes (Methods/Buffers/Packages/Regions)
Parsing completed
Disassembly completed
ASL Output: ./SSDT.dsl - 1631 bytes
# cat SSDT.dsl
/*
* Intel ACPI Component Architecture
* AML/ASL+ Disassembler version 20150717-64
* Copyright (c) 2000 - 2015 Intel Corporation
*
* Disassembling to symbolic ASL+ operators
*
* Disassembly of /sys/firmware/acpi/tables/SSDT, Sun Feb 5 00:19:37 2017
*
* Original Table Header:
* Signature "SSDT"
* Length 0x000000CA (202)
* Revision 0x01
* Checksum 0x4B
* OEM ID "BOCHS "
* OEM Table ID "VMGENID"
* OEM Revision 0x00000001 (1)
* Compiler ID "BXPC"
* Compiler Version 0x00000001 (1)
*/
DefinitionBlock ("/sys/firmware/acpi/tables/SSDT.aml", "SSDT", 1, "BOCHS ", "VMGENID", 0x00000001)
{
Name (VGIA, 0x07FFF000)
Scope (\_SB)
{
Device (VGEN)
{
Name (_HID, "QEMUVGID") // _HID: Hardware ID
Name (_CID, "VM_Gen_Counter") // _CID: Compatible ID
Name (_DDN, "VM_Gen_Counter") // _DDN: DOS Device Name
Method (_STA, 0, NotSerialized) // _STA: Status
{
Local0 = 0x0F
If ((VGIA == Zero))
{
Local0 = Zero
}
Return (Local0)
}
Method (ADDR, 0, NotSerialized)
{
Local0 = Package (0x02) {}
Index (Local0, Zero) = (VGIA + 0x28)
Index (Local0, One) = Zero
Return (Local0)
}
}
}
Method (\_GPE._E05, 0, NotSerialized) // _Exx: Edge-Triggered GPE
{
Notify (\_SB.VGEN, 0x80) // Status Change
}
}
Design Details:
---------------
Requirements R1a through R1e dictate that the memory holding the
VM Generation ID must be allocated and owned by the guest firmware,
in this case BIOS or UEFI. However, to be useful, QEMU must be able to
change the contents of the memory at runtime, specifically when starting a
backed-up or snapshotted image. In order to do this, QEMU must know the
address that has been allocated.
The mechanism chosen for this memory sharing is writable fw_cfg blobs.
These are data object that are visible to both QEMU and guests, and are
addressable as sequential files.
More information about fw_cfg can be found in :doc:`fw_cfg`.
Two fw_cfg blobs are used in this case:
``/etc/vmgenid_guid``
- contains the actual VM Generation ID GUID
- read-only to the guest
``/etc/vmgenid_addr``
- contains the address of the downloaded vmgenid blob
- writable by the guest
QEMU sends the following commands to the guest at startup:
1. Allocate memory for vmgenid_guid fw_cfg blob.
2. Write the address of vmgenid_guid into the SSDT (VGIA ACPI variable as
shown above in the iasl dump). Note that this change is not propagated
back to QEMU.
3. Write the address of vmgenid_guid back to QEMU's copy of vmgenid_addr
via the fw_cfg DMA interface.
After step 3, QEMU is able to update the contents of vmgenid_guid at will.
Since BIOS or UEFI does not necessarily run when we wish to change the GUID,
the value of VGIA is persisted via the VMState mechanism.
As spelled out in the specification, any change to the GUID executes an
ACPI notification. The exact handler to use is not specified, so the vmgenid
device uses the first unused one: ``\_GPE._E05``.
Endian-ness Considerations:
---------------------------
Although not specified in Microsoft's document, it is assumed that the
device is expected to use little-endian format.
All GUID passed in via command line or monitor are treated as big-endian.
GUID values displayed via monitor are shown in big-endian format.
GUID Storage Format:
--------------------
In order to implement an OVMF "SDT Header Probe Suppressor", the contents of
the vmgenid_guid fw_cfg blob are not simply a 128-bit GUID. There is also
significant padding in order to align and fill a memory page, as shown in the
following diagram::
+----------------------------------+
| SSDT with OEM Table ID = VMGENID |
+----------------------------------+
| ... | TOP OF PAGE
| VGIA dword object ---------------|-----> +---------------------------+
| ... | | fw-allocated array for |
| _STA method referring to VGIA | | "etc/vmgenid_guid" |
| ... | +---------------------------+
| ADDR method referring to VGIA | | 0: OVMF SDT Header probe |
| ... | | suppressor |
+----------------------------------+ | 36: padding for 8-byte |
| alignment |
| 40: GUID |
| 56: padding to page size |
+---------------------------+
END OF PAGE
Device Usage:
-------------
The device has one property, which may be only be set using the command line:
``guid``
sets the value of the GUID. A special value ``auto`` instructs
QEMU to generate a new random GUID.
For example::
QEMU -device vmgenid,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
QEMU -device vmgenid,guid=auto
The property may be queried via QMP/HMP::
(QEMU) query-vm-generation-id
{"return": {"guid": "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"}}
Setting of this parameter is intentionally left out from the QMP/HMP
interfaces. There are no known use cases for changing the GUID once QEMU is
running, and adding this capability would greatly increase the complexity.

View file

@ -1,245 +0,0 @@
VIRTUAL MACHINE GENERATION ID
=============================
Copyright (C) 2016 Red Hat, Inc.
Copyright (C) 2017 Skyport Systems, Inc.
This work is licensed under the terms of the GNU GPL, version 2 or later.
See the COPYING file in the top-level directory.
===
The VM generation ID (vmgenid) device is an emulated device which
exposes a 128-bit, cryptographically random, integer value identifier,
referred to as a Globally Unique Identifier, or GUID.
This allows management applications (e.g. libvirt) to notify the guest
operating system when the virtual machine is executed with a different
configuration (e.g. snapshot execution or creation from a template). The
guest operating system notices the change, and is then able to react as
appropriate by marking its copies of distributed databases as dirty,
re-initializing its random number generator etc.
Requirements
------------
These requirements are extracted from the "How to implement virtual machine
generation ID support in a virtualization platform" section of the
specification, dated August 1, 2012.
The document may be found on the web at:
http://go.microsoft.com/fwlink/?LinkId=260709
R1a. The generation ID shall live in an 8-byte aligned buffer.
R1b. The buffer holding the generation ID shall be in guest RAM, ROM, or device
MMIO range.
R1c. The buffer holding the generation ID shall be kept separate from areas
used by the operating system.
R1d. The buffer shall not be covered by an AddressRangeMemory or
AddressRangeACPI entry in the E820 or UEFI memory map.
R1e. The generation ID shall not live in a page frame that could be mapped with
caching disabled. (In other words, regardless of whether the generation ID
lives in RAM, ROM or MMIO, it shall only be mapped as cacheable.)
R2 to R5. [These AML requirements are isolated well enough in the Microsoft
specification for us to simply refer to them here.]
R6. The hypervisor shall expose a _HID (hardware identifier) object in the
VMGenId device's scope that is unique to the hypervisor vendor.
QEMU Implementation
-------------------
The above-mentioned specification does not dictate which ACPI descriptor table
will contain the VM Generation ID device. Other implementations (Hyper-V and
Xen) put it in the main descriptor table (Differentiated System Description
Table or DSDT). For ease of debugging and implementation, we have decided to
put it in its own Secondary System Description Table, or SSDT.
The following is a dump of the contents from a running system:
# iasl -p ./SSDT -d /sys/firmware/acpi/tables/SSDT
Intel ACPI Component Architecture
ASL+ Optimizing Compiler version 20150717-64
Copyright (c) 2000 - 2015 Intel Corporation
Reading ACPI table from file /sys/firmware/acpi/tables/SSDT - Length
00000198 (0x0000C6)
ACPI: SSDT 0x0000000000000000 0000C6 (v01 BOCHS VMGENID 00000001 BXPC
00000001)
Acpi table [SSDT] successfully installed and loaded
Pass 1 parse of [SSDT]
Pass 2 parse of [SSDT]
Parsing Deferred Opcodes (Methods/Buffers/Packages/Regions)
Parsing completed
Disassembly completed
ASL Output: ./SSDT.dsl - 1631 bytes
# cat SSDT.dsl
/*
* Intel ACPI Component Architecture
* AML/ASL+ Disassembler version 20150717-64
* Copyright (c) 2000 - 2015 Intel Corporation
*
* Disassembling to symbolic ASL+ operators
*
* Disassembly of /sys/firmware/acpi/tables/SSDT, Sun Feb 5 00:19:37 2017
*
* Original Table Header:
* Signature "SSDT"
* Length 0x000000CA (202)
* Revision 0x01
* Checksum 0x4B
* OEM ID "BOCHS "
* OEM Table ID "VMGENID"
* OEM Revision 0x00000001 (1)
* Compiler ID "BXPC"
* Compiler Version 0x00000001 (1)
*/
DefinitionBlock ("/sys/firmware/acpi/tables/SSDT.aml", "SSDT", 1, "BOCHS ",
"VMGENID", 0x00000001)
{
Name (VGIA, 0x07FFF000)
Scope (\_SB)
{
Device (VGEN)
{
Name (_HID, "QEMUVGID") // _HID: Hardware ID
Name (_CID, "VM_Gen_Counter") // _CID: Compatible ID
Name (_DDN, "VM_Gen_Counter") // _DDN: DOS Device Name
Method (_STA, 0, NotSerialized) // _STA: Status
{
Local0 = 0x0F
If ((VGIA == Zero))
{
Local0 = Zero
}
Return (Local0)
}
Method (ADDR, 0, NotSerialized)
{
Local0 = Package (0x02) {}
Index (Local0, Zero) = (VGIA + 0x28)
Index (Local0, One) = Zero
Return (Local0)
}
}
}
Method (\_GPE._E05, 0, NotSerialized) // _Exx: Edge-Triggered GPE
{
Notify (\_SB.VGEN, 0x80) // Status Change
}
}
Design Details:
---------------
Requirements R1a through R1e dictate that the memory holding the
VM Generation ID must be allocated and owned by the guest firmware,
in this case BIOS or UEFI. However, to be useful, QEMU must be able to
change the contents of the memory at runtime, specifically when starting a
backed-up or snapshotted image. In order to do this, QEMU must know the
address that has been allocated.
The mechanism chosen for this memory sharing is writable fw_cfg blobs.
These are data object that are visible to both QEMU and guests, and are
addressable as sequential files.
More information about fw_cfg can be found in "docs/specs/fw_cfg.txt"
Two fw_cfg blobs are used in this case:
/etc/vmgenid_guid - contains the actual VM Generation ID GUID
- read-only to the guest
/etc/vmgenid_addr - contains the address of the downloaded vmgenid blob
- writable by the guest
QEMU sends the following commands to the guest at startup:
1. Allocate memory for vmgenid_guid fw_cfg blob.
2. Write the address of vmgenid_guid into the SSDT (VGIA ACPI variable as
shown above in the iasl dump). Note that this change is not propagated
back to QEMU.
3. Write the address of vmgenid_guid back to QEMU's copy of vmgenid_addr
via the fw_cfg DMA interface.
After step 3, QEMU is able to update the contents of vmgenid_guid at will.
Since BIOS or UEFI does not necessarily run when we wish to change the GUID,
the value of VGIA is persisted via the VMState mechanism.
As spelled out in the specification, any change to the GUID executes an
ACPI notification. The exact handler to use is not specified, so the vmgenid
device uses the first unused one: \_GPE._E05.
Endian-ness Considerations:
---------------------------
Although not specified in Microsoft's document, it is assumed that the
device is expected to use little-endian format.
All GUID passed in via command line or monitor are treated as big-endian.
GUID values displayed via monitor are shown in big-endian format.
GUID Storage Format:
--------------------
In order to implement an OVMF "SDT Header Probe Suppressor", the contents of
the vmgenid_guid fw_cfg blob are not simply a 128-bit GUID. There is also
significant padding in order to align and fill a memory page, as shown in the
following diagram:
+----------------------------------+
| SSDT with OEM Table ID = VMGENID |
+----------------------------------+
| ... | TOP OF PAGE
| VGIA dword object ---------------|-----> +---------------------------+
| ... | | fw-allocated array for |
| _STA method referring to VGIA | | "etc/vmgenid_guid" |
| ... | +---------------------------+
| ADDR method referring to VGIA | | 0: OVMF SDT Header probe |
| ... | | suppressor |
+----------------------------------+ | 36: padding for 8-byte |
| alignment |
| 40: GUID |
| 56: padding to page size |
+---------------------------+
END OF PAGE
Device Usage:
-------------
The device has one property, which may be only be set using the command line:
guid - sets the value of the GUID. A special value "auto" instructs
QEMU to generate a new random GUID.
For example:
QEMU -device vmgenid,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
QEMU -device vmgenid,guid=auto
The property may be queried via QMP/HMP:
(QEMU) query-vm-generation-id
{"return": {"guid": "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"}}
Setting of this parameter is intentionally left out from the QMP/HMP
interfaces. There are no known use cases for changing the GUID once QEMU is
running, and adding this capability would greatly increase the complexity.

View file

@ -0,0 +1,115 @@
==============================
VMWare PVSCSI Device Interface
==============================
..
Created by Dmitry Fleytman (dmitry@daynix.com), Daynix Computing LTD.
This document describes the VMWare PVSCSI device interface specification,
based on the source code of the PVSCSI Linux driver from kernel 3.0.4.
Overview
========
The interface is based on a memory area shared between hypervisor and VM.
The memory area is obtained by driver as a device IO memory resource of
``PVSCSI_MEM_SPACE_SIZE`` length.
The shared memory consists of a registers area and a rings area.
The registers area is used to raise hypervisor interrupts and issue device
commands. The rings area is used to transfer data descriptors and SCSI
commands from VM to hypervisor and to transfer messages produced by
hypervisor to VM. Data itself is transferred via virtual scatter-gather DMA.
PVSCSI Device Registers
=======================
The length of the registers area is 1 page
(``PVSCSI_MEM_SPACE_COMMAND_NUM_PAGES``). The structure of the
registers area is described by the ``PVSCSIRegOffset`` enum. There
are registers to issue device commands (with optional short data),
issue device interrupts, and control interrupt masking.
PVSCSI Device Rings
===================
There are three rings in shared memory:
Request ring (``struct PVSCSIRingReqDesc *req_ring``)
ring for OS to device requests
Completion ring (``struct PVSCSIRingCmpDesc *cmp_ring``)
ring for device request completions
Message ring (``struct PVSCSIRingMsgDesc *msg_ring``)
ring for messages from device. This ring is optional and the
guest might not configure it.
There is a control area (``struct PVSCSIRingsState *rings_state``)
used to control rings operation.
PVSCSI Device to Host Interrupts
================================
The following interrupt types are supported by the PVSCSI device:
Completion interrupts (completion ring notifications):
- ``PVSCSI_INTR_CMPL_0``
- ``PVSCSI_INTR_CMPL_1``
Message interrupts (message ring notifications):
- ``PVSCSI_INTR_MSG_0``
- ``PVSCSI_INTR_MSG_1``
Interrupts are controlled via the ``PVSCSI_REG_OFFSET_INTR_MASK``
register. If a bit is set it means the interrupt is enabled, and if
it is clear then the interrupt is disabled.
The interrupt modes supported are legacy, MSI and MSI-X.
In the case of legacy interrupts, the ``PVSCSI_REG_OFFSET_INTR_STATUS``
register is used to check which interrupt has arrived. Interrupts are
acknowledged when the corresponding bit is written to the interrupt
status register.
PVSCSI Device Operation Sequences
=================================
Startup sequence
----------------
a. Issue ``PVSCSI_CMD_ADAPTER_RESET`` command
b. Windows driver reads interrupt status register here
c. Issue ``PVSCSI_CMD_SETUP_MSG_RING`` command with no additional data,
check status and disable device messages if error returned
(Omitted if device messages disabled by driver configuration)
d. Issue ``PVSCSI_CMD_SETUP_RINGS`` command, provide rings configuration
as ``struct PVSCSICmdDescSetupRings``
e. Issue ``PVSCSI_CMD_SETUP_MSG_RING`` command again, provide
rings configuration as ``struct PVSCSICmdDescSetupMsgRing``
f. Unmask completion and message (if device messages enabled) interrupts
Shutdown sequence
-----------------
a. Mask interrupts
b. Flush request ring using ``PVSCSI_REG_OFFSET_KICK_NON_RW_IO``
c. Issue ``PVSCSI_CMD_ADAPTER_RESET`` command
Send request
------------
a. Fill next free request ring descriptor
b. Issue ``PVSCSI_REG_OFFSET_KICK_RW_IO`` for R/W operations
or ``PVSCSI_REG_OFFSET_KICK_NON_RW_IO`` for other operations
Abort command
-------------
a. Issue ``PVSCSI_CMD_ABORT_CMD`` command
Request completion processing
-----------------------------
a. Upon completion interrupt arrival process completion
and message (if enabled) rings

View file

@ -1,92 +0,0 @@
General Description
===================
This document describes VMWare PVSCSI device interface specification.
Created by Dmitry Fleytman (dmitry@daynix.com), Daynix Computing LTD.
Based on source code of PVSCSI Linux driver from kernel 3.0.4
PVSCSI Device Interface Overview
================================
The interface is based on memory area shared between hypervisor and VM.
Memory area is obtained by driver as device IO memory resource of
PVSCSI_MEM_SPACE_SIZE length.
The shared memory consists of registers area and rings area.
The registers area is used to raise hypervisor interrupts and issue device
commands. The rings area is used to transfer data descriptors and SCSI
commands from VM to hypervisor and to transfer messages produced by
hypervisor to VM. Data itself is transferred via virtual scatter-gather DMA.
PVSCSI Device Registers
=======================
The length of the registers area is 1 page (PVSCSI_MEM_SPACE_COMMAND_NUM_PAGES).
The structure of the registers area is described by the PVSCSIRegOffset enum.
There are registers to issue device command (with optional short data),
issue device interrupt, control interrupts masking.
PVSCSI Device Rings
===================
There are three rings in shared memory:
1. Request ring (struct PVSCSIRingReqDesc *req_ring)
- ring for OS to device requests
2. Completion ring (struct PVSCSIRingCmpDesc *cmp_ring)
- ring for device request completions
3. Message ring (struct PVSCSIRingMsgDesc *msg_ring)
- ring for messages from device.
This ring is optional and the guest might not configure it.
There is a control area (struct PVSCSIRingsState *rings_state) used to control
rings operation.
PVSCSI Device to Host Interrupts
================================
There are following interrupt types supported by PVSCSI device:
1. Completion interrupts (completion ring notifications):
PVSCSI_INTR_CMPL_0
PVSCSI_INTR_CMPL_1
2. Message interrupts (message ring notifications):
PVSCSI_INTR_MSG_0
PVSCSI_INTR_MSG_1
Interrupts are controlled via PVSCSI_REG_OFFSET_INTR_MASK register
Bit set means interrupt enabled, bit cleared - disabled
Interrupt modes supported are legacy, MSI and MSI-X
In case of legacy interrupts, register PVSCSI_REG_OFFSET_INTR_STATUS
is used to check which interrupt has arrived. Interrupts are
acknowledged when the corresponding bit is written to the interrupt
status register.
PVSCSI Device Operation Sequences
=================================
1. Startup sequence:
a. Issue PVSCSI_CMD_ADAPTER_RESET command;
aa. Windows driver reads interrupt status register here;
b. Issue PVSCSI_CMD_SETUP_MSG_RING command with no additional data,
check status and disable device messages if error returned;
(Omitted if device messages disabled by driver configuration)
c. Issue PVSCSI_CMD_SETUP_RINGS command, provide rings configuration
as struct PVSCSICmdDescSetupRings;
d. Issue PVSCSI_CMD_SETUP_MSG_RING command again, provide
rings configuration as struct PVSCSICmdDescSetupMsgRing;
e. Unmask completion and message (if device messages enabled) interrupts.
2. Shutdown sequences
a. Mask interrupts;
b. Flush request ring using PVSCSI_REG_OFFSET_KICK_NON_RW_IO;
c. Issue PVSCSI_CMD_ADAPTER_RESET command.
3. Send request
a. Fill next free request ring descriptor;
b. Issue PVSCSI_REG_OFFSET_KICK_RW_IO for R/W operations;
or PVSCSI_REG_OFFSET_KICK_NON_RW_IO for other operations.
4. Abort command
a. Issue PVSCSI_CMD_ABORT_CMD command;
5. Request completion processing
a. Upon completion interrupt arrival process completion
and message (if enabled) rings.

View file

@ -33,7 +33,7 @@ syntax when using the shared memory server is:
When using the server, the guest will be assigned a VM ID (>=0) that
allows guests using the same server to communicate via interrupts.
Guests can read their VM ID from a device register (see
ivshmem-spec.txt).
:doc:`../../specs/ivshmem-spec`).
Migration with ivshmem
~~~~~~~~~~~~~~~~~~~~~~