mirror of
https://github.com/Motorhead1991/qemu.git
synced 2025-08-07 01:33:56 -06:00
ppc patch queue 2019-05-29
Next pull request against qemu-4.1. Highlights: * KVM accelerated support for the XIVE interrupt controller in PAPR guests * A number of TCG vector fixes * Fixes for the PReP / 40p machine * Improvements to make check-tcg test coverage Other than that it's just a bunch of assorted fixes, cleanups and minor improvements. This supersedes both the pull request dated 2019-05-21 and the one dated 2019-05-22. I've dropped one hunk which I think may have caused the check-tcg failure that Peter saw (by enabling the ppc64abi32 build, which I think has been broken for ages). I'm not entirely certain, since I haven't reproduced exactly the same failure. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlzuK2AACgkQbDjKyiDZ s5LFfxAAuvBI2d5gYDSDiniJPMrEzM8ANynf8fYMGSySRNpeKz5PnMhNQieoxaEt pS9qJnnaCDrpV09jJo6QWStUaqFqnLPOYdWvRsnb7mx0yXe5eWUyYPp0TRAqKj8S Ainv9ma8WfhVphsH3E01KR6evdC6BDC0F2afDToFGMKcDKXafmnSOEV9ZtFAzFXO xqh/Az+Y2ATwDmt92uSq7JBS5YRUvhYQORoKslxnrJswKkN+Uwi5+a2FzOHk3Jwe BlV6soEAVqb9ItFtgwcArclryCMMVxrqzs2VTWOYbhznFX0X1xUNeSQ8H+7F+IVy Xu1e2fnwufvilvWSsjtYvdYnnCbNvwgWjYfZNMrQ2hmSDtCQnRKyVIYwiU08Qj2y LmVlQzWN3WYHIRBTACLMDf5VHa9P01QZeJEoVIV6i4m4PCxbSmlzI62eRKNhW917 2d3h8dGIxSDm9/WpXefKMMrt2P7fAqkiz5ZUZIjkspcHaPPmk7qQp0ngFjeEuyFk tJMd87hgemm9gg+mcF9XQ8yZGkR3oTq7nwDGwZHrp8S0GyRvNwhTbT2iKzAG2cxe kfWRFswxn1zYPShqkcj3rwNsg8LnC3b22Og/obHYVjQ8ONx4ZB0q8xJSkUpvsQf5 HEUHLHtstBmrInFMf+2KbViUIpobmn4woojjNsqZ32W7OZv6Yk4= =2q3B -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190529' into staging ppc patch queue 2019-05-29 Next pull request against qemu-4.1. Highlights: * KVM accelerated support for the XIVE interrupt controller in PAPR guests * A number of TCG vector fixes * Fixes for the PReP / 40p machine * Improvements to make check-tcg test coverage Other than that it's just a bunch of assorted fixes, cleanups and minor improvements. This supersedes both the pull request dated 2019-05-21 and the one dated 2019-05-22. I've dropped one hunk which I think may have caused the check-tcg failure that Peter saw (by enabling the ppc64abi32 build, which I think has been broken for ages). I'm not entirely certain, since I haven't reproduced exactly the same failure. # gpg: Signature made Wed 29 May 2019 07:49:04 BST # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-4.1-20190529: (44 commits) ppc/pnv: add dummy XSCOM registers for PRD initialization ppc/pnv: introduce new skiboot platform properties spapr: Don't migrate the hpt_maxpagesize cap to older machine types spapr: change default interrupt mode to 'dual' spapr/xive: fix multiple resets when using the 'dual' interrupt mode docs: provide documentation on the POWER9 XIVE interrupt controller spapr/irq: add KVM support to the 'dual' machine ppc/xics: fix irq priority in ics_set_irq_type() spapr/irq: initialize the IRQ device only once spapr/irq: introduce a spapr_irq_init_device() helper spapr: check for the activation of the KVM IRQ device spapr: introduce routines to delete the KVM IRQ device sysbus: add a sysbus_mmio_unmap() helper spapr/xive: activate KVM support spapr/xive: add migration support for KVM spapr/xive: introduce a VM state change handler spapr/xive: add state synchronization with KVM spapr/xive: add hcall support when under KVM spapr/xive: add KVM support spapr: Print out extra hints when CAS negotiation of interrupt mode fails ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This commit is contained in:
commit
60905286cb
43 changed files with 1983 additions and 215 deletions
|
@ -12,4 +12,5 @@ Welcome to QEMU's documentation!
|
|||
|
||||
interop/index
|
||||
devel/index
|
||||
specs/index
|
||||
|
||||
|
|
13
docs/specs/index.rst
Normal file
13
docs/specs/index.rst
Normal file
|
@ -0,0 +1,13 @@
|
|||
. This is the top level page for the 'specs' manual
|
||||
|
||||
|
||||
QEMU full-system emulation guest hardware specifications
|
||||
========================================================
|
||||
|
||||
|
||||
Contents:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
xive
|
174
docs/specs/ppc-spapr-xive.rst
Normal file
174
docs/specs/ppc-spapr-xive.rst
Normal file
|
@ -0,0 +1,174 @@
|
|||
XIVE for sPAPR (pseries machines)
|
||||
=================================
|
||||
|
||||
The POWER9 processor comes with a new interrupt controller
|
||||
architecture, called XIVE as "eXternal Interrupt Virtualization
|
||||
Engine". It supports a larger number of interrupt sources and offers
|
||||
virtualization features which enables the HW to deliver interrupts
|
||||
directly to virtual processors without hypervisor assistance.
|
||||
|
||||
A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9
|
||||
processors can run under two interrupt modes:
|
||||
|
||||
- *Legacy Compatibility Mode*
|
||||
|
||||
the hypervisor provides identical interfaces and similar
|
||||
functionality to PAPR+ Version 2.7. This is the default mode
|
||||
|
||||
It is also referred as *XICS* in QEMU.
|
||||
|
||||
- *XIVE native exploitation mode*
|
||||
|
||||
the hypervisor provides new interfaces to manage the XIVE control
|
||||
structures, and provides direct control for interrupt management
|
||||
through MMIO pages.
|
||||
|
||||
Which interrupt modes can be used by the machine is negotiated with
|
||||
the guest O/S during the Client Architecture Support negotiation
|
||||
sequence. The two modes are mutually exclusive.
|
||||
|
||||
Both interrupt mode share the same IRQ number space. See below for the
|
||||
layout.
|
||||
|
||||
CAS Negotiation
|
||||
---------------
|
||||
|
||||
QEMU advertises the supported interrupt modes in the device tree
|
||||
property "ibm,arch-vec-5-platform-support" in byte 23 and the OS
|
||||
Selection for XIVE is indicated in the "ibm,architecture-vec-5"
|
||||
property byte 23.
|
||||
|
||||
The interrupt modes supported by the machine depend on the CPU type
|
||||
(POWER9 is required for XIVE) but also on the machine property
|
||||
``ic-mode`` which can be set on the command line. It can take the
|
||||
following values: ``xics``, ``xive``, ``dual`` and currently ``xics``
|
||||
is the default but it may change in the future.
|
||||
|
||||
The choosen interrupt mode is activated after a reconfiguration done
|
||||
in a machine reset.
|
||||
|
||||
XIVE Device tree properties
|
||||
---------------------------
|
||||
|
||||
The properties for the PAPR interrupt controller node when the *XIVE
|
||||
native exploitation mode* is selected shoud contain:
|
||||
|
||||
- ``device_type``
|
||||
|
||||
value should be "power-ivpe".
|
||||
|
||||
- ``compatible``
|
||||
|
||||
value should be "ibm,power-ivpe".
|
||||
|
||||
- ``reg``
|
||||
|
||||
contains the base address and size of the thread interrupt
|
||||
managnement areas (TIMA), for the User level and for the Guest OS
|
||||
level. Only the Guest OS level is taken into account today.
|
||||
|
||||
- ``ibm,xive-eq-sizes``
|
||||
|
||||
the size of the event queues. One cell per size supported, contains
|
||||
log2 of size, in ascending order.
|
||||
|
||||
- ``ibm,xive-lisn-ranges``
|
||||
|
||||
the IRQ interrupt number ranges assigned to the guest for the IPIs.
|
||||
|
||||
The root node also exports :
|
||||
|
||||
- ``ibm,plat-res-int-priorities``
|
||||
|
||||
contains a list of priorities that the hypervisor has reserved for
|
||||
its own use.
|
||||
|
||||
IRQ number space
|
||||
----------------
|
||||
|
||||
IRQ Number space of the ``pseries`` machine is 8K wide and is the same
|
||||
for both interrupt mode. The different ranges are defined as follow :
|
||||
|
||||
- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE)
|
||||
- ``0x1000 .. 0x1000`` 1 EPOW
|
||||
- ``0x1001 .. 0x1001`` 1 HOTPLUG
|
||||
- ``0x1100 .. 0x11FF`` 256 VIO devices
|
||||
- ``0x1200 .. 0x127F`` 32 PHBs devices
|
||||
- ``0x1280 .. 0x12FF`` unused
|
||||
- ``0x1300 .. 0x1FFF`` PHB MSIs
|
||||
|
||||
Monitoring XIVE
|
||||
---------------
|
||||
|
||||
The state of the XIVE interrupt controller can be queried through the
|
||||
monitor commands ``info pic``. The output comes in two parts.
|
||||
|
||||
First, the state of the thread interrupt context registers is dumped
|
||||
for each CPU :
|
||||
|
||||
::
|
||||
|
||||
(qemu) info pic
|
||||
CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2
|
||||
CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000
|
||||
CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400
|
||||
CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000
|
||||
CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000
|
||||
...
|
||||
|
||||
In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only
|
||||
the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM
|
||||
line which is set to the VP identifier.
|
||||
|
||||
Then comes the routing information which aggregates the EAS and the
|
||||
END configuration:
|
||||
|
||||
::
|
||||
|
||||
...
|
||||
LISN PQ EISN CPU/PRIO EQ
|
||||
00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
|
||||
00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
|
||||
00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
|
||||
00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ]
|
||||
00000004 MSI -Q M 00000000
|
||||
00000005 MSI -Q M 00000000
|
||||
00000006 MSI -Q M 00000000
|
||||
00000007 MSI -Q M 00000000
|
||||
00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
|
||||
00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
|
||||
00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
|
||||
00001101 MSI -Q M 00000000
|
||||
00001200 LSI -Q M 00000000
|
||||
00001201 LSI -Q M 00000000
|
||||
00001202 LSI -Q M 00000000
|
||||
00001203 LSI -Q M 00000000
|
||||
00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
|
||||
00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
|
||||
00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ]
|
||||
|
||||
The source information and configuration:
|
||||
|
||||
- The ``LISN`` column outputs the interrupt number of the source in
|
||||
range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI``
|
||||
- The ``PQ`` column reflects the state of the PQ bits of the source :
|
||||
|
||||
- ``--`` source is ready to take events
|
||||
- ``P-`` an event was sent and an EOI is PENDING
|
||||
- ``PQ`` an event was QUEUED
|
||||
- ``-Q`` source is OFF
|
||||
|
||||
a ``M`` indicates that source is *MASKED* at the EAS level,
|
||||
|
||||
The targeting configuration :
|
||||
|
||||
- The ``EISN`` column is the event data that will be queued in the event
|
||||
queue of the O/S.
|
||||
- The ``CPU/PRIO`` column is the tuple defining the CPU number and
|
||||
priority queue serving the source.
|
||||
- The ``EQ`` column outputs :
|
||||
|
||||
- the current index of the event queue/ the max number of entries
|
||||
- the O/S event queue address
|
||||
- the toggle bit
|
||||
- the last entries that were pushed in the event queue.
|
199
docs/specs/ppc-xive.rst
Normal file
199
docs/specs/ppc-xive.rst
Normal file
|
@ -0,0 +1,199 @@
|
|||
================================
|
||||
POWER9 XIVE interrupt controller
|
||||
================================
|
||||
|
||||
The POWER9 processor comes with a new interrupt controller
|
||||
architecture, called XIVE as "eXternal Interrupt Virtualization
|
||||
Engine".
|
||||
|
||||
Compared to the previous architecture, the main characteristics of
|
||||
XIVE are to support a larger number of interrupt sources and to
|
||||
deliver interrupts directly to virtual processors without hypervisor
|
||||
assistance. This removes the context switches required for the
|
||||
delivery process.
|
||||
|
||||
|
||||
XIVE architecture
|
||||
=================
|
||||
|
||||
The XIVE IC is composed of three sub-engines, each taking care of a
|
||||
processing layer of external interrupts:
|
||||
|
||||
- Interrupt Virtualization Source Engine (IVSE), or Source Controller
|
||||
(SC). These are found in PCI PHBs, in the PSI host bridge
|
||||
controller, but also inside the main controller for the core IPIs
|
||||
and other sub-chips (NX, CAP, NPU) of the chip/processor. They are
|
||||
configured to feed the IVRE with events.
|
||||
- Interrupt Virtualization Routing Engine (IVRE) or Virtualization
|
||||
Controller (VC). It handles event coalescing and perform interrupt
|
||||
routing by matching an event source number with an Event
|
||||
Notification Descriptor (END).
|
||||
- Interrupt Virtualization Presentation Engine (IVPE) or Presentation
|
||||
Controller (PC). It maintains the interrupt context state of each
|
||||
thread and handles the delivery of the external interrupt to the
|
||||
thread.
|
||||
|
||||
::
|
||||
|
||||
XIVE Interrupt Controller
|
||||
+------------------------------------+ IPIs
|
||||
| +---------+ +---------+ +--------+ | +-------+
|
||||
| |IVRE | |Common Q | |IVPE |----> | CORES |
|
||||
| | esb | | | | |----> | |
|
||||
| | eas | | Bridge | | tctx |----> | |
|
||||
| |SC end | | | | nvt | | | |
|
||||
+------+ | +---------+ +----+----+ +--------+ | +-+-+-+-+
|
||||
| RAM | +------------------|-----------------+ | | |
|
||||
| | | | | |
|
||||
| | | | | |
|
||||
| | +--------------------v------------------------v-v-v--+ other
|
||||
| <--+ Power Bus +--> chips
|
||||
| esb | +---------+-----------------------+------------------+
|
||||
| eas | | |
|
||||
| end | +--|------+ |
|
||||
| nvt | +----+----+ | +----+----+
|
||||
+------+ |IVSE | | |IVSE |
|
||||
| | | | |
|
||||
| PQ-bits | | | PQ-bits |
|
||||
| local |-+ | in VC |
|
||||
+---------+ +---------+
|
||||
PCIe NX,NPU,CAPI
|
||||
|
||||
|
||||
PQ-bits: 2 bits source state machine (P:pending Q:queued)
|
||||
esb: Event State Buffer (Array of PQ bits in an IVSE)
|
||||
eas: Event Assignment Structure
|
||||
end: Event Notification Descriptor
|
||||
nvt: Notification Virtual Target
|
||||
tctx: Thread interrupt Context registers
|
||||
|
||||
|
||||
|
||||
XIVE internal tables
|
||||
--------------------
|
||||
|
||||
Each of the sub-engines uses a set of tables to redirect interrupts
|
||||
from event sources to CPU threads.
|
||||
|
||||
::
|
||||
|
||||
+-------+
|
||||
User or O/S | EQ |
|
||||
or +------>|entries|
|
||||
Hypervisor | | .. |
|
||||
Memory | +-------+
|
||||
| ^
|
||||
| |
|
||||
+-------------------------------------------------+
|
||||
| |
|
||||
Hypervisor +------+ +---+--+ +---+--+ +------+
|
||||
Memory | ESB | | EAT | | ENDT | | NVTT |
|
||||
(skiboot) +----+-+ +----+-+ +----+-+ +------+
|
||||
^ | ^ | ^ | ^
|
||||
| | | | | | |
|
||||
+-------------------------------------------------+
|
||||
| | | | | | |
|
||||
| | | | | | |
|
||||
+----|--|--------|--|--------|--|-+ +-|-----+ +------+
|
||||
| | | | | | | | | | tctx| |Thread|
|
||||
IPI or ---+ + v + v + v |---| + .. |-----> |
|
||||
HW events | | | | | |
|
||||
| IVRE | | IVPE | +------+
|
||||
+---------------------------------+ +-------+
|
||||
|
||||
|
||||
The IVSE have a 2-bits state machine, P for pending and Q for queued,
|
||||
for each source that allows events to be triggered. They are stored in
|
||||
an Event State Buffer (ESB) array and can be controlled by MMIOs.
|
||||
|
||||
If the event is let through, the IVRE looks up in the Event Assignment
|
||||
Structure (EAS) table for an Event Notification Descriptor (END)
|
||||
configured for the source. Each Event Notification Descriptor defines
|
||||
a notification path to a CPU and an in-memory Event Queue, in which
|
||||
will be enqueued an EQ data for the O/S to pull.
|
||||
|
||||
The IVPE determines if a Notification Virtual Target (NVT) can handle
|
||||
the event by scanning the thread contexts of the VCPUs dispatched on
|
||||
the processor HW threads. It maintains the interrupt context state of
|
||||
each thread in a NVT table.
|
||||
|
||||
XIVE thread interrupt context
|
||||
-----------------------------
|
||||
|
||||
The XIVE presenter can generate four different exceptions to its
|
||||
HW threads:
|
||||
|
||||
- hypervisor exception
|
||||
- O/S exception
|
||||
- Event-Based Branch (user level)
|
||||
- msgsnd (doorbell)
|
||||
|
||||
Each exception has a state independent from the others called a Thread
|
||||
Interrupt Management context. This context is a set of registers which
|
||||
lets the thread handle priority management and interrupt
|
||||
acknowledgment among other things. The most important ones being :
|
||||
|
||||
- Interrupt Priority Register (PIPR)
|
||||
- Interrupt Pending Buffer (IPB)
|
||||
- Current Processor Priority (CPPR)
|
||||
- Notification Source Register (NSR)
|
||||
|
||||
TIMA
|
||||
~~~~
|
||||
|
||||
The Thread Interrupt Management registers are accessible through a
|
||||
specific MMIO region, called the Thread Interrupt Management Area
|
||||
(TIMA), four aligned pages, each exposing a different view of the
|
||||
registers. First page (page address ending in ``0b00``) gives access
|
||||
to the entire context and is reserved for the ring 0 view for the
|
||||
physical thread context. The second (page address ending in ``0b01``)
|
||||
is for the hypervisor, ring 1 view. The third (page address ending in
|
||||
``0b10``) is for the operating system, ring 2 view. The fourth (page
|
||||
address ending in ``0b11``) is for user level, ring 3 view.
|
||||
|
||||
Interrupt flow from an O/S perspective
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
After an event data has been enqueued in the O/S Event Queue, the IVPE
|
||||
raises the bit corresponding to the priority of the pending interrupt
|
||||
in the register IBP (Interrupt Pending Buffer) to indicate that an
|
||||
event is pending in one of the 8 priority queues. The Pending
|
||||
Interrupt Priority Register (PIPR) is also updated using the IPB. This
|
||||
register represent the priority of the most favored pending
|
||||
notification.
|
||||
|
||||
The PIPR is then compared to the the Current Processor Priority
|
||||
Register (CPPR). If it is more favored (numerically less than), the
|
||||
CPU interrupt line is raised and the EO bit of the Notification Source
|
||||
Register (NSR) is updated to notify the presence of an exception for
|
||||
the O/S. The O/S acknowledges the interrupt with a special load in the
|
||||
Thread Interrupt Management Area.
|
||||
|
||||
The O/S handles the interrupt and when done, performs an EOI using a
|
||||
MMIO operation on the ESB management page of the associate source.
|
||||
|
||||
Overview of the QEMU models for XIVE
|
||||
====================================
|
||||
|
||||
The XiveSource models the IVSE in general, internal and external. It
|
||||
handles the source ESBs and the MMIO interface to control them.
|
||||
|
||||
The XiveNotifier is a small helper interface interconnecting the
|
||||
XiveSource to the XiveRouter.
|
||||
|
||||
The XiveRouter is an abstract model acting as a combined IVRE and
|
||||
IVPE. It routes event notifications using the EAS and END tables to
|
||||
the IVPE sub-engine which does a CAM scan to find a CPU to deliver the
|
||||
exception. Storage should be provided by the inheriting classes.
|
||||
|
||||
XiveEnDSource is a special source object. It exposes the END ESB MMIOs
|
||||
of the Event Queues which are used for coalescing event notifications
|
||||
and for escalation. Not used on the field, only to sync the EQ cache
|
||||
in OPAL.
|
||||
|
||||
Finally, the XiveTCTX contains the interrupt state context of a thread,
|
||||
four sets of registers, one for each exception that can be delivered
|
||||
to a CPU. These contexts are scanned by the IVPE to find a matching VP
|
||||
when a notification is triggered. It also models the Thread Interrupt
|
||||
Management Area (TIMA), which exposes the thread context registers to
|
||||
the CPU for interrupt management.
|
Loading…
Add table
Add a link
Reference in a new issue