mirror of
https://github.com/Motorhead1991/qemu.git
synced 2025-08-05 00:33:55 -06:00
ppc 7.0 queue:
* General cleanup for Mac machines (Peter) * Fixes for FPU exceptions (Lucas) * Support for new ISA31 instructions (Matheus) * Fixes for ivshmem (Daniel) * Cleanups for PowerNV PHB (Christophe and Cedric) * Updates of PowerNV and pSeries documentation (Leonardo and Daniel) * Fixes for PowerNV (Daniel) * Large cleanup of FPU implementation (Richard) * Removal of SoftTLBs support for PPC74x CPUs (Fabiano) * Fixes for exception models in MPCx and 60x CPUs (Fabiano) * Removal of 401/403 CPUs (Cedric) * Deprecation of taihu machine (Thomas) * Large rework of PPC405 machine (Cedric) * Fixes for VSX instructions (Victor and Matheus) * Fix for e6500 CPU (Fabiano) * Initial support for PMU (Daniel) -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmG8xt8ACgkQUaNDx8/7 7KG3Dg/9EXK3GslNgUNRvB1pgRSimnrUirGUiDmZPXxevIbsoPsYaXmUcD1zOnlb zXiCzQ2Bvi8ZUjT1uScP7dkFCdzs6gXYbTEcTzscX3k2VnTjXHXhQ3cnb0uModP5 U1QzrjV7K/q1usJW5OVSGZS1PoWOqWuZNdcp0mIUWcJHhSaYtUGGPohp7rH0JSug ncmkRA0KLgIX8eg8swyfJxrw9wCcXlFIcmwHipB8S/Dd/gUpmFEoaQsmugSJNYZe zi8Fd4jfzlRXVwb8EUSiOiaXSd/WKjEcQx/usbzzaBacbktk/nfy+rligUMryCpO vGFM5blxEX5SXD3Cd0vcFwYhCZImphD8K+Sxe6Us69rsUH11hJS+q29/Puk1MkHt DTubqB3k4BheiatOV1zeUMlbRm5svUhGj3VstFZYZeZ3Oh47Jsx3XH4hoytUuc/1 lP9UGkaf3nIx12vSqBA/3Crc7zalWX5OhaUV5RG30+jxd8zHOKcasKbd22710DNz 4WybQLb3bpUr091mWMKcaAkP6bxcE8S+mR4LE2kdELboAnkB+OgSmrdZ3slceaCv btV8qjNl4f8lBvyFQVxZ5bn05+TfxUXFlFxXipxf1fI64bYwRnyQQ3yRxMHipRYK CRta1akVgIgcBbeeRHBZLA12UgTQJY6WIoDaZMz9NxIDHJnX/jw= =APFd -----END PGP SIGNATURE----- Merge tag 'pull-ppc-20211217' of https://github.com/legoater/qemu into staging ppc 7.0 queue: * General cleanup for Mac machines (Peter) * Fixes for FPU exceptions (Lucas) * Support for new ISA31 instructions (Matheus) * Fixes for ivshmem (Daniel) * Cleanups for PowerNV PHB (Christophe and Cedric) * Updates of PowerNV and pSeries documentation (Leonardo and Daniel) * Fixes for PowerNV (Daniel) * Large cleanup of FPU implementation (Richard) * Removal of SoftTLBs support for PPC74x CPUs (Fabiano) * Fixes for exception models in MPCx and 60x CPUs (Fabiano) * Removal of 401/403 CPUs (Cedric) * Deprecation of taihu machine (Thomas) * Large rework of PPC405 machine (Cedric) * Fixes for VSX instructions (Victor and Matheus) * Fix for e6500 CPU (Fabiano) * Initial support for PMU (Daniel) # gpg: Signature made Fri 17 Dec 2021 09:20:31 AM PST # gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1 # gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1 * tag 'pull-ppc-20211217' of https://github.com/legoater/qemu: (101 commits) ppc/pnv: Use QOM hierarchy to scan PEC PHB4 devices ppc/pnv: Move realize of PEC stacks under the PEC model ppc/pnv: Remove "system-memory" property from PHB4 PEC ppc/pnv: Compute the PHB index from the PHB4 PEC model ppc/pnv: Introduce a num_stack class attribute ppc/pnv: Introduce a "chip" property under the PHB4 model ppc/pnv: Introduce version and device_id class atributes for PHB4 devices ppc/pnv: Introduce a num_pecs class attribute for PHB4 PEC devices ppc/pnv: Use QOM hierarchy to scan PHB3 devices ppc/pnv: Move mapping of the PHB3 CQ regions under pnv_pbcq_realize() ppc/pnv: Drop the "num-phbs" property ppc/pnv: Use the chip class to check the index of PHB3 devices ppc/pnv: Introduce a "chip" property under PHB3 PPC64/TCG: Implement 'rfebb' instruction target/ppc/power8-pmu.c: add PM_RUN_INST_CMPL (0xFA) event target/ppc: enable PMU instruction count target/ppc: enable PMU counter overflow with cycle events target/ppc: PMU: update counters on MMCR1 write target/ppc: PMU: update counters on PMCs r/w target/ppc: PMU basic cycle count for pseries TCG ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This commit is contained in:
commit
93dc314c92
59 changed files with 2513 additions and 1646 deletions
|
@ -315,6 +315,15 @@ This machine is deprecated because we have enough AST2500 based OpenPOWER
|
|||
machines. It can be easily replaced by the ``witherspoon-bmc`` or the
|
||||
``romulus-bmc`` machines.
|
||||
|
||||
PPC 405 ``taihu`` machine (since 7.0)
|
||||
'''''''''''''''''''''''''''''''''''''
|
||||
|
||||
The PPC 405 CPU is a system-on-a-chip, so all 405 machines are very similar,
|
||||
except for some external periphery. However, the periphery of the ``taihu``
|
||||
machine is hardly emulated at all (e.g. neither the LCD nor the USB part had
|
||||
been implemented), so there is not much value added by this board. Use the
|
||||
``ref405ep`` machine instead.
|
||||
|
||||
Backend options
|
||||
---------------
|
||||
|
||||
|
|
100
docs/specs/ppc-spapr-hcalls.rst
Normal file
100
docs/specs/ppc-spapr-hcalls.rst
Normal file
|
@ -0,0 +1,100 @@
|
|||
sPAPR hypervisor calls
|
||||
----------------------
|
||||
|
||||
When used with the ``pseries`` machine type, ``qemu-system-ppc64`` implements
|
||||
a set of hypervisor calls (a.k.a. hcalls) defined in the `Linux on Power
|
||||
Architecture Reference document (LoPAR)
|
||||
<https://cdn.openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200812.pdf>`_.
|
||||
This document is a subset of the Power Architecture Platform Reference (PAPR+)
|
||||
specification (IBM internal only), which is what PowerVM, the IBM proprietary
|
||||
hypervisor, adheres to.
|
||||
|
||||
The subset in LoPAR is selected based on the requirements of Linux as a guest.
|
||||
|
||||
In addition to those calls, we have added our own private hypervisor
|
||||
calls which are mostly used as a private interface between the firmware
|
||||
running in the guest and QEMU.
|
||||
|
||||
All those hypercalls start at hcall number 0xf000 which correspond
|
||||
to an implementation specific range in PAPR.
|
||||
|
||||
H_RTAS (0xf000)
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
RTAS stands for Run-Time Abstraction Sercies and is a set of runtime services
|
||||
generally provided by the firmware inside the guest to the operating system. It
|
||||
predates the existence of hypervisors (it was originally an extension to Open
|
||||
Firmware) and is still used by PAPR and LoPAR to provide various services that
|
||||
are not performance sensitive.
|
||||
|
||||
We currently implement the RTAS services in QEMU itself. The actual RTAS
|
||||
"firmware" blob in the guest is a small stub of a few instructions which
|
||||
calls our private H_RTAS hypervisor call to pass the RTAS calls to QEMU.
|
||||
|
||||
Arguments:
|
||||
|
||||
``r3``: ``H_RTAS (0xf000)``
|
||||
|
||||
``r4``: Guest physical address of RTAS parameter block.
|
||||
|
||||
Returns:
|
||||
|
||||
``H_SUCCESS``: Successfully called the RTAS function (RTAS result will have
|
||||
been stored in the parameter block).
|
||||
|
||||
``H_PARAMETER``: Unknown token.
|
||||
|
||||
H_LOGICAL_MEMOP (0xf001)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
When the guest runs in "real mode" (in powerpc terminology this means with MMU
|
||||
disabled, i.e. guest effective address equals to guest physical address), it
|
||||
only has access to a subset of memory and no I/Os.
|
||||
|
||||
PAPR and LoPAR provides a set of hypervisor calls to perform cacheable or
|
||||
non-cacheable accesses to any guest physical addresses that the
|
||||
guest can use in order to access IO devices while in real mode.
|
||||
|
||||
This is typically used by the firmware running in the guest.
|
||||
|
||||
However, doing a hypercall for each access is extremely inefficient
|
||||
(even more so when running KVM) when accessing the frame buffer. In
|
||||
that case, things like scrolling become unusably slow.
|
||||
|
||||
This hypercall allows the guest to request a "memory op" to be applied
|
||||
to memory. The supported memory ops at this point are to copy a range
|
||||
of memory (supports overlap of source and destination) and XOR which
|
||||
is used by our SLOF firmware to invert the screen.
|
||||
|
||||
Arguments:
|
||||
|
||||
``r3 ``: ``H_LOGICAL_MEMOP (0xf001)``
|
||||
|
||||
``r4``: Guest physical address of destination.
|
||||
|
||||
``r5``: Guest physical address of source.
|
||||
|
||||
``r6``: Individual element size, defined by the binary logarithm of the
|
||||
desired size. Supported values are:
|
||||
|
||||
``0`` = 1 byte
|
||||
|
||||
``1`` = 2 bytes
|
||||
|
||||
``2`` = 4 bytes
|
||||
|
||||
``3`` = 8 bytes
|
||||
|
||||
``r7``: Number of elements.
|
||||
|
||||
``r8``: Operation. Supported values are:
|
||||
|
||||
``0``: copy
|
||||
|
||||
``1``: xor
|
||||
|
||||
Returns:
|
||||
|
||||
``H_SUCCESS``: Success.
|
||||
|
||||
``H_PARAMETER``: Invalid argument.
|
|
@ -1,78 +0,0 @@
|
|||
When used with the "pseries" machine type, QEMU-system-ppc64 implements
|
||||
a set of hypervisor calls using a subset of the server "PAPR" specification
|
||||
(IBM internal at this point), which is also what IBM's proprietary hypervisor
|
||||
adheres too.
|
||||
|
||||
The subset is selected based on the requirements of Linux as a guest.
|
||||
|
||||
In addition to those calls, we have added our own private hypervisor
|
||||
calls which are mostly used as a private interface between the firmware
|
||||
running in the guest and QEMU.
|
||||
|
||||
All those hypercalls start at hcall number 0xf000 which correspond
|
||||
to an implementation specific range in PAPR.
|
||||
|
||||
- H_RTAS (0xf000)
|
||||
|
||||
RTAS is a set of runtime services generally provided by the firmware
|
||||
inside the guest to the operating system. It predates the existence
|
||||
of hypervisors (it was originally an extension to Open Firmware) and
|
||||
is still used by PAPR to provide various services that aren't performance
|
||||
sensitive.
|
||||
|
||||
We currently implement the RTAS services in QEMU itself. The actual RTAS
|
||||
"firmware" blob in the guest is a small stub of a few instructions which
|
||||
calls our private H_RTAS hypervisor call to pass the RTAS calls to QEMU.
|
||||
|
||||
Arguments:
|
||||
|
||||
r3 : H_RTAS (0xf000)
|
||||
r4 : Guest physical address of RTAS parameter block
|
||||
|
||||
Returns:
|
||||
|
||||
H_SUCCESS : Successfully called the RTAS function (RTAS result
|
||||
will have been stored in the parameter block)
|
||||
H_PARAMETER : Unknown token
|
||||
|
||||
- H_LOGICAL_MEMOP (0xf001)
|
||||
|
||||
When the guest runs in "real mode" (in powerpc lingua this means
|
||||
with MMU disabled, ie guest effective == guest physical), it only
|
||||
has access to a subset of memory and no IOs.
|
||||
|
||||
PAPR provides a set of hypervisor calls to perform cacheable or
|
||||
non-cacheable accesses to any guest physical addresses that the
|
||||
guest can use in order to access IO devices while in real mode.
|
||||
|
||||
This is typically used by the firmware running in the guest.
|
||||
|
||||
However, doing a hypercall for each access is extremely inefficient
|
||||
(even more so when running KVM) when accessing the frame buffer. In
|
||||
that case, things like scrolling become unusably slow.
|
||||
|
||||
This hypercall allows the guest to request a "memory op" to be applied
|
||||
to memory. The supported memory ops at this point are to copy a range
|
||||
of memory (supports overlap of source and destination) and XOR which
|
||||
is used by our SLOF firmware to invert the screen.
|
||||
|
||||
Arguments:
|
||||
|
||||
r3: H_LOGICAL_MEMOP (0xf001)
|
||||
r4: Guest physical address of destination
|
||||
r5: Guest physical address of source
|
||||
r6: Individual element size
|
||||
0 = 1 byte
|
||||
1 = 2 bytes
|
||||
2 = 4 bytes
|
||||
3 = 8 bytes
|
||||
r7: Number of elements
|
||||
r8: Operation
|
||||
0 = copy
|
||||
1 = xor
|
||||
|
||||
Returns:
|
||||
|
||||
H_SUCCESS : Success
|
||||
H_PARAMETER : Invalid argument
|
||||
|
|
@ -1,7 +1,7 @@
|
|||
PowerNV family boards (``powernv8``, ``powernv9``)
|
||||
PowerNV family boards (``powernv8``, ``powernv9``, ``powernv10``)
|
||||
==================================================================
|
||||
|
||||
PowerNV (as Non-Virtualized) is the "baremetal" platform using the
|
||||
PowerNV (as Non-Virtualized) is the "bare metal" platform using the
|
||||
OPAL firmware. It runs Linux on IBM and OpenPOWER systems and it can
|
||||
be used as an hypervisor OS, running KVM guests, or simply as a host
|
||||
OS.
|
||||
|
@ -16,16 +16,14 @@ Supported devices
|
|||
-----------------
|
||||
|
||||
* Multi processor support for POWER8, POWER8NVL and POWER9.
|
||||
* XSCOM, serial communication sideband bus to configure chiplets
|
||||
* Simple LPC Controller
|
||||
* Processor Service Interface (PSI) Controller
|
||||
* Interrupt Controller, XICS (POWER8) and XIVE (POWER9)
|
||||
* POWER8 PHB3 PCIe Host bridge and POWER9 PHB4 PCIe Host bridge
|
||||
* Simple OCC is an on-chip microcontroller used for power management
|
||||
tasks
|
||||
* iBT device to handle BMC communication, with the internal BMC
|
||||
simulator provided by QEMU or an external BMC such as an Aspeed
|
||||
QEMU machine.
|
||||
* XSCOM, serial communication sideband bus to configure chiplets.
|
||||
* Simple LPC Controller.
|
||||
* Processor Service Interface (PSI) Controller.
|
||||
* Interrupt Controller, XICS (POWER8) and XIVE (POWER9) and XIVE2 (Power10).
|
||||
* POWER8 PHB3 PCIe Host bridge and POWER9 PHB4 PCIe Host bridge.
|
||||
* Simple OCC is an on-chip micro-controller used for power management tasks.
|
||||
* iBT device to handle BMC communication, with the internal BMC simulator
|
||||
provided by QEMU or an external BMC such as an Aspeed QEMU machine.
|
||||
* PNOR containing the different firmware partitions.
|
||||
|
||||
Missing devices
|
||||
|
@ -33,31 +31,42 @@ Missing devices
|
|||
|
||||
A lot is missing, among which :
|
||||
|
||||
* POWER10 processor
|
||||
* XIVE2 (POWER10) interrupt controller
|
||||
* I2C controllers (yet to be merged)
|
||||
* NPU/NPU2/NPU3 controllers
|
||||
* EEH support for PCIe Host bridge controllers
|
||||
* NX controller
|
||||
* VAS controller
|
||||
* chipTOD (Time Of Day)
|
||||
* I2C controllers (yet to be merged).
|
||||
* NPU/NPU2/NPU3 controllers.
|
||||
* EEH support for PCIe Host bridge controllers.
|
||||
* NX controller.
|
||||
* VAS controller.
|
||||
* chipTOD (Time Of Day).
|
||||
* Self Boot Engine (SBE).
|
||||
* FSI bus
|
||||
* FSI bus.
|
||||
|
||||
Firmware
|
||||
--------
|
||||
|
||||
The OPAL firmware (OpenPower Abstraction Layer) for OpenPower systems
|
||||
includes the runtime services ``skiboot`` and the bootloader kernel and
|
||||
initramfs ``skiroot``. Source code can be found on GitHub:
|
||||
initramfs ``skiroot``. Source code can be found on the `OpenPOWER account at
|
||||
GitHub <https://github.com/open-power>`_.
|
||||
|
||||
https://github.com/open-power.
|
||||
|
||||
Prebuilt images of ``skiboot`` and ``skiroot`` are made available on the `OpenPOWER <https://github.com/open-power/op-build/releases/>`__ site.
|
||||
Prebuilt images of ``skiboot`` and ``skiroot`` are made available on the
|
||||
`OpenPOWER <https://github.com/open-power/op-build/releases/>`__ site.
|
||||
|
||||
QEMU includes a prebuilt image of ``skiboot`` which is updated when a
|
||||
more recent version is required by the models.
|
||||
|
||||
Current acceleration status
|
||||
---------------------------
|
||||
|
||||
KVM acceleration in Linux Power hosts is provided by the kvm-hv and
|
||||
kvm-pr modules. kvm-hv is adherent to PAPR and it's not compliant with
|
||||
powernv. kvm-pr in theory could be used as a valid accel option but
|
||||
this isn't supported by kvm-pr at this moment.
|
||||
|
||||
To spare users from dealing with not so informative errors when attempting
|
||||
to use accel=kvm, the powernv machine will throw an error informing that
|
||||
KVM is not supported. This can be revisited in the future if kvm-pr (or
|
||||
any other KVM alternative) is usable as KVM accel for this machine.
|
||||
|
||||
Boot options
|
||||
------------
|
||||
|
||||
|
@ -83,6 +92,7 @@ and a SATA disk :
|
|||
|
||||
Complex PCIe configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Six PHBs are defined per chip (POWER9) but no default PCI layout is
|
||||
provided (to be compatible with libvirt). One PCI device can be added
|
||||
on any of the available PCIe slots using command line options such as:
|
||||
|
@ -157,7 +167,7 @@ one on the command line :
|
|||
The files `palmetto-SDR.bin <http://www.kaod.org/qemu/powernv/palmetto-SDR.bin>`__
|
||||
and `palmetto-FRU.bin <http://www.kaod.org/qemu/powernv/palmetto-FRU.bin>`__
|
||||
define a Sensor Data Record repository and a Field Replaceable Unit
|
||||
inventory for a palmetto BMC. They can be used to extend the QEMU BMC
|
||||
inventory for a Palmetto BMC. They can be used to extend the QEMU BMC
|
||||
simulator.
|
||||
|
||||
.. code-block:: bash
|
||||
|
@ -189,4 +199,8 @@ CAVEATS
|
|||
-------
|
||||
|
||||
* No support for multiple HW threads (SMT=1). Same as pseries.
|
||||
* CPU can hang when doing intensive I/Os. Use ``-append powersave=off`` in that case.
|
||||
|
||||
Maintainer contact information
|
||||
------------------------------
|
||||
|
||||
Cédric Le Goater <clg@kaod.org>
|
||||
|
|
|
@ -1,12 +1,238 @@
|
|||
pSeries family boards (``pseries``)
|
||||
===================================
|
||||
|
||||
The Power machine para-virtualized environment described by the `Linux on Power
|
||||
Architecture Reference document (LoPAR)
|
||||
<https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200812.pdf>`_
|
||||
is called pSeries. This environment is also known as sPAPR, System p guests, or
|
||||
simply Power Linux guests (although it is capable of running other operating
|
||||
systems, such as AIX).
|
||||
|
||||
Even though pSeries is designed to behave as a guest environment, it is also
|
||||
capable of acting as a hypervisor OS, providing, on that role, nested
|
||||
virtualization capabilities.
|
||||
|
||||
Supported devices
|
||||
-----------------
|
||||
|
||||
* Multi processor support for many Power processors generations: POWER7,
|
||||
POWER7+, POWER8, POWER8NVL, POWER9, and Power10. Support for POWER5+ exists,
|
||||
but its state is unknown.
|
||||
* Interrupt Controller, XICS (POWER8) and XIVE (POWER9 and Power10)
|
||||
* vPHB PCIe Host bridge.
|
||||
* vscsi and vnet devices, compatible with the same devices available on a
|
||||
PowerVM hypervisor with VIOS managing LPARs.
|
||||
* Virtio based devices.
|
||||
* PCIe device pass through.
|
||||
|
||||
Missing devices
|
||||
---------------
|
||||
|
||||
* SPICE support.
|
||||
|
||||
Firmware
|
||||
--------
|
||||
|
||||
`SLOF <https://github.com/aik/SLOF>`_ (Slimline Open Firmware) is an
|
||||
implementation of the `IEEE 1275-1994, Standard for Boot (Initialization
|
||||
Configuration) Firmware: Core Requirements and Practices
|
||||
<https://standards.ieee.org/standard/1275-1994.html>`_.
|
||||
|
||||
QEMU includes a prebuilt image of SLOF which is updated when a more recent
|
||||
version is required.
|
||||
|
||||
Build directions
|
||||
----------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
./configure --target-list=ppc64-softmmu && make
|
||||
|
||||
Running instructions
|
||||
--------------------
|
||||
|
||||
Someone can select the pSeries machine type by running QEMU with the following
|
||||
options:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
qemu-system-ppc64 -M pseries <other QEMU arguments>
|
||||
|
||||
sPAPR devices
|
||||
-------------
|
||||
|
||||
The sPAPR specification defines a set of para-virtualized devices, which are
|
||||
also supported by the pSeries machine in QEMU and can be instantiated with the
|
||||
``-device`` option:
|
||||
|
||||
* ``spapr-vlan`` : a virtual network interface.
|
||||
* ``spapr-vscsi`` : a virtual SCSI disk interface.
|
||||
* ``spapr-rng`` : a pseudo-device for passing random number generator data to the
|
||||
guest (see the `H_RANDOM hypercall feature
|
||||
<https://wiki.qemu.org/Features/HRandomHypercall>`_ for details).
|
||||
* ``spapr-vty``: a virtual teletype.
|
||||
* ``spapr-pci-host-bridge``: a PCI host bridge.
|
||||
* ``tpm-spapr``: a Trusted Platform Module (TPM).
|
||||
* ``spapr-tpm-proxy``: a TPM proxy.
|
||||
|
||||
These are compatible with the devices historically available for use when
|
||||
running the IBM PowerVM hypervisor with LPARs.
|
||||
|
||||
However, since these devices have originally been specified with another
|
||||
hypervisor and non-Linux guests in mind, you should use the virtio counterparts
|
||||
(virtio-net, virtio-blk/scsi and virtio-rng for instance) if possible instead,
|
||||
since they will most probably give you better performance with Linux guests in a
|
||||
QEMU environment.
|
||||
|
||||
The pSeries machine in QEMU is always instantiated with the following devices:
|
||||
|
||||
* A NVRAM device (``spapr-nvram``).
|
||||
* A virtual teletype (``spapr-vty``).
|
||||
* A PCI host bridge (``spapr-pci-host-bridge``).
|
||||
|
||||
Hence, it is not needed to add them manually, unless you use the ``-nodefaults``
|
||||
command line option in QEMU.
|
||||
|
||||
In the case of the default ``spapr-nvram`` device, if someone wants to make the
|
||||
contents of the NVRAM device persistent, they will need to specify a PFLASH
|
||||
device when starting QEMU, i.e. either use
|
||||
``-drive if=pflash,file=<filename>,format=raw`` to set the default PFLASH
|
||||
device, or specify one with an ID
|
||||
(``-drive if=none,file=<filename>,format=raw,id=pfid``) and pass that ID to the
|
||||
NVRAM device with ``-global spapr-nvram.drive=pfid``.
|
||||
|
||||
sPAPR specification
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The main source of documentation on the sPAPR standard is the `Linux on Power
|
||||
Architecture Reference document (LoPAR)
|
||||
<https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200812.pdf>`_.
|
||||
However, documentation specific to QEMU's implementation of the specification
|
||||
can also be found in QEMU documentation:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
../../specs/ppc-spapr-hcalls.rst
|
||||
../../specs/ppc-spapr-numa.rst
|
||||
../../specs/ppc-spapr-xive.rst
|
||||
|
||||
Other documentation available in QEMU docs directory:
|
||||
|
||||
* Hot plug (``/docs/specs/ppc-spapr-hotplug.txt``).
|
||||
* Hypervisor calls needed by the Ultravisor
|
||||
(``/docs/specs/ppc-spapr-uv-hcalls.txt``).
|
||||
|
||||
Switching between the KVM-PR and KVM-HV kernel module
|
||||
-----------------------------------------------------
|
||||
|
||||
Currently, there are two implementations of KVM on Power, ``kvm_hv.ko`` and
|
||||
``kvm_pr.ko``.
|
||||
|
||||
|
||||
If a host supports both KVM modes, and both KVM kernel modules are loaded, it is
|
||||
possible to switch between the two modes with the ``kvm-type`` parameter:
|
||||
|
||||
* Use ``qemu-system-ppc64 -M pseries,accel=kvm,kvm-type=PR`` to use the
|
||||
``kvm_pr.ko`` kernel module.
|
||||
* Use ``qemu-system-ppc64 -M pseries,accel=kvm,kvm-type=HV`` to use ``kvm_hv.ko``
|
||||
instead.
|
||||
|
||||
KVM-PR
|
||||
^^^^^^
|
||||
|
||||
KVM-PR uses the so-called **PR**\ oblem state of the PPC CPUs to run the guests,
|
||||
i.e. the virtual machine is run in user mode and all privileged instructions
|
||||
trap and have to be emulated by the host. That means you can run KVM-PR inside
|
||||
a pSeries guest (or a PowerVM LPAR for that matter), and that is where it has
|
||||
originated, as historically (prior to POWER7) it was not possible to run Linux
|
||||
on hypervisor mode on a Power processor (this function was restricted to
|
||||
PowerVM, the IBM proprietary hypervisor).
|
||||
|
||||
Because all privileged instructions are trapped, guests that use a lot of
|
||||
privileged instructions run quite slow with KVM-PR. On the other hand, because
|
||||
of that, this kernel module can run on pretty much every PPC hardware, and is
|
||||
able to emulate a lot of guests CPUs. This module can even be used to run other
|
||||
PowerPC guests like an emulated PowerMac.
|
||||
|
||||
As KVM-PR can be run inside a pSeries guest, it can also provide nested
|
||||
virtualization capabilities (i.e. running a guest from within a guest).
|
||||
|
||||
It is important to notice that, as KVM-HV provides a much better execution
|
||||
performance, maintenance work has been much more focused on it in the past
|
||||
years. Maintenance for KVM-PR has been minimal.
|
||||
|
||||
In order to run KVM-PR guests with POWER9 processors, someone will need to start
|
||||
QEMU with ``kernel_irqchip=off`` command line option.
|
||||
|
||||
KVM-HV
|
||||
^^^^^^
|
||||
|
||||
KVM-HV uses the hypervisor mode of more recent Power processors, that allow
|
||||
access to the bare metal hardware directly. Although POWER7 had this capability,
|
||||
it was only starting with POWER8 that this was officially supported by IBM.
|
||||
|
||||
Originally, KVM-HV was only available when running on a PowerNV platform (a.k.a.
|
||||
Power bare metal). Although it runs on a PowerNV platform, it can only be used
|
||||
to start pSeries guests. As the pSeries guest doesn't have access to the
|
||||
hypervisor mode of the Power CPU, it wasn't possible to run KVM-HV on a guest.
|
||||
This limitation has been lifted, and now it is possible to run KVM-HV inside
|
||||
pSeries guests as well, making nested virtualization possible with KVM-HV.
|
||||
|
||||
As KVM-HV has access to privileged instructions, guests that use a lot of these
|
||||
can run much faster than with KVM-PR. On the other hand, the guest CPU has to be
|
||||
of the same type as the host CPU this way, e.g. it is not possible to specify an
|
||||
embedded PPC CPU for the guest with KVM-HV. However, there is at least the
|
||||
possibility to run the guest in a backward-compatibility mode of the previous
|
||||
CPUs generations, e.g. you can run a POWER7 guest on a POWER8 host by using
|
||||
``-cpu POWER8,compat=power7`` as parameter to QEMU.
|
||||
|
||||
Modules support
|
||||
---------------
|
||||
|
||||
As noticed in the sections above, each module can run in a different
|
||||
environment. The following table shows with which environment each module can
|
||||
run. As long as you are in a supported environment, you can run KVM-PR or KVM-HV
|
||||
nested. Combinations not shown in the table are not available.
|
||||
|
||||
+--------------+------------+------+-------------------+----------+--------+
|
||||
| Platform | Host type | Bits | Page table format | KVM-HV | KVM-PR |
|
||||
+==============+============+======+===================+==========+========+
|
||||
| PowerNV | bare metal | 32 | hash | no | yes |
|
||||
| | | +-------------------+----------+--------+
|
||||
| | | | radix | N/A | N/A |
|
||||
| | +------+-------------------+----------+--------+
|
||||
| | | 64 | hash | yes | yes |
|
||||
| | | +-------------------+----------+--------+
|
||||
| | | | radix | yes | no |
|
||||
+--------------+------------+------+-------------------+----------+--------+
|
||||
| pSeries [1]_ | PowerNV | 32 | hash | no | yes |
|
||||
| | | +-------------------+----------+--------+
|
||||
| | | | radix | N/A | N/A |
|
||||
| | +------+-------------------+----------+--------+
|
||||
| | | 64 | hash | no | yes |
|
||||
| | | +-------------------+----------+--------+
|
||||
| | | | radix | yes [2]_ | no |
|
||||
| +------------+------+-------------------+----------+--------+
|
||||
| | PowerVM | 32 | hash | no | yes |
|
||||
| | | +-------------------+----------+--------+
|
||||
| | | | radix | N/A | N/A |
|
||||
| | +------+-------------------+----------+--------+
|
||||
| | | 64 | hash | no | yes |
|
||||
| | | +-------------------+----------+--------+
|
||||
| | | | radix [3]_ | no | yes |
|
||||
+--------------+------------+------+-------------------+----------+--------+
|
||||
|
||||
.. [1] On POWER9 DD2.1 processors, the page table format on the host and guest
|
||||
must be the same.
|
||||
|
||||
.. [2] KVM-HV cannot run nested on POWER8 machines.
|
||||
|
||||
.. [3] Introduced on Power10 machines.
|
||||
|
||||
Maintainer contact information
|
||||
------------------------------
|
||||
|
||||
Cédric Le Goater <clg@kaod.org>
|
||||
|
||||
Daniel Henrique Barboza <danielhb413@gmail.com>
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue