qemu/include/hw
Alexey Kardashevskiy ec132efaa8 spapr: Support NVIDIA V100 GPU with NVLink2
NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory
space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver
implements special regions for such GPUs and emulates an NVLink bridge.
NVLink2-enabled POWER9 CPUs also provide address translation services
which includes an ATS shootdown (ATSD) register exported via the NVLink
bridge device.

This adds a quirk to VFIO to map the GPU memory and create an MR;
the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses
this to get the MR and map it to the system address space.
Another quirk does the same for ATSD.

This adds additional steps to sPAPR PHB setup:

1. Search for specific GPUs and NPUs, collect findings in
sPAPRPHBState::nvgpus, manage system address space mappings;

2. Add device-specific properties such as "ibm,npu", "ibm,gpu",
"memory-block", "link-speed" to advertise the NVLink2 function to
the guest;

3. Add "mmio-atsd" to vPHB to advertise the ATSD capability;

4. Add new memory blocks (with extra "linux,memory-usable" to prevent
the guest OS from accessing the new memory until it is onlined) and
npuphb# nodes representing an NPU unit for every vPHB as the GPU driver
uses it for link discovery.

This allocates space for GPU RAM and ATSD like we do for MMIOs by
adding 2 new parameters to the phb_placement() hook. Older machine types
set these to zero.

This puts new memory nodes in a separate NUMA node to as the GPU RAM
needs to be configured equally distant from any other node in the system.
Unlike the host setup which assigns numa ids from 255 downwards, this
adds new NUMA nodes after the user configures nodes or from 1 if none
were configured.

This adds requirement similar to EEH - one IOMMU group per vPHB.
The reason for this is that ATSD registers belong to a physical NPU
so they cannot invalidate translations on GPUs attached to another NPU.
It is guaranteed by the host platform as it does not mix NVLink bridges
or GPUs from different NPU in the same IOMMU group. If more than one
IOMMU group is detected on a vPHB, this disables ATSD support for that
vPHB and prints a warning.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for vfio portions]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20190312082103.130561-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26 10:41:23 +10:00
..
acpi i386, acpi: check acpi_memory_hotplug capacity in pre_plug 2019-03-12 22:31:21 -04:00
adc STM32F2xx: Add the ADC device 2016-10-04 13:28:07 +01:00
arm hw/arm/virt: Dynamic memory map depending on RAM requirements 2019-03-05 15:55:09 +00:00
audio audio: use TYPE_MV88W8618_AUDIO instead of hardcoded string 2018-10-29 13:50:15 +01:00
block pflash: Require backend size to match device, improve errors 2019-03-26 08:16:24 +01:00
char hw/char/pl011: Support all interrupt lines 2019-02-21 18:17:46 +00:00
core hw/core/split-irq: Device that splits IRQ lines 2018-03-02 11:03:45 +00:00
cpu qom/cpu: Add cluster_index to CPUState 2019-01-29 11:46:05 +00:00
cris char: rename CharDriverState Chardev 2017-01-27 18:07:59 +01:00
display hw/display/milkymist-tmu2: Move inlined code from header to source 2019-02-01 11:58:50 +01:00
dma hw/dma/pl080: Don't use CPU address space for DMA accesses 2018-08-20 11:24:33 +01:00
firmware hw/smbios: fix offset of type 3 sku field 2019-02-22 10:51:31 -05:00
gpio hw/gpio/nrf51_gpio: Add nRF51 GPIO peripheral 2019-01-07 15:23:47 +00:00
hyperv hyperv: process POST_MESSAGE hypercall 2018-10-19 13:44:14 +02:00
i2c i2c:smbus_slave: Add an SMBus vmstate structure 2019-02-27 21:06:08 -06:00
i386 intel_iommu: Drop extended root field 2019-04-02 11:49:14 -04:00
ide hw/ide: drop iov field from IDEDMA 2019-02-22 09:42:13 +00:00
input hw/input/ps2: Remove PS2State from "qemu/typedefs.h" 2019-01-22 05:14:32 +01:00
intc hw/intc/bcm2836_control: Implement local timer 2019-03-15 11:12:28 +00:00
ipack ipack: Update e-mail address 2016-05-18 15:04:27 +03:00
ipmi qom: make interface types abstract 2018-12-11 15:45:22 -02:00
isa qom: make interface types abstract 2018-12-11 15:45:22 -02:00
kvm hw: move headers to include/ 2013-04-08 18:13:10 +02:00
lm32 intc: make HMP 'info irq' and 'info pic' commands use InterruptStatsProvider interface 2016-10-04 10:00:25 +02:00
m68k hw/char/mcf_uart: QOMify the ColdFire UART 2017-02-16 14:06:56 +01:00
mem nvdimm: Rename AcpiNVDIMMState into NVDIMMState 2019-03-11 10:44:21 -03:00
mips hw/mips: Use the IEC binary prefix definitions 2018-07-02 15:41:16 +02:00
misc hw/arm/armsse: Unify init-svtor and cpuwait handling 2019-02-28 11:03:04 +00:00
net net: cadence_gem: Add support for selecting the DMA MemoryRegion 2018-10-16 17:13:49 +01:00
nvram hw/nvram/nrf51_nvm: Add nRF51 non-volatile memories 2019-02-01 15:31:26 +00:00
pci pci: Allow PCI bus subtypes to support extended config space accesses 2019-04-09 09:14:47 +10:00
pci-bridge Clean up includes 2018-02-09 05:05:11 +01:00
pci-host spapr: Support NVIDIA V100 GPU with NVLink2 2019-04-26 10:41:23 +10:00
ppc spapr: Support NVIDIA V100 GPU with NVLink2 2019-04-26 10:41:23 +10:00
rdma {hmp, hw/pvrdma}: Expose device internals via monitor interface 2019-03-16 15:52:44 +02:00
riscv riscv: plic: Fix incorrect irq calculation 2019-04-04 16:36:19 -07:00
s390x target/s390x: Split out s390-tod.h 2019-02-18 11:25:43 +01:00
scsi scsi: esp: Defer command completion until previous interrupts have been handled 2019-01-11 13:57:24 +01:00
sd sdcard: Reflect when the Spec v3 is supported in the Config Register (SCR) 2018-06-08 13:15:34 +01:00
sh4 avoid TABs in files that only contain a few 2019-01-11 15:46:56 +01:00
sparc sun4u_iommu: update to reflect IOMMU is no longer part of the APB device 2018-01-09 21:48:20 +00:00
ssi aspeed/smc: snoop SPI transfers to fake dummy cycles 2019-01-29 11:46:05 +00:00
timer hw/timer/pl031: Allow use as an embedded-struct device 2019-02-21 18:17:46 +00:00
tricore Clean up header guards that don't match their file name 2016-07-12 16:19:16 +02:00
unicore32 hw/unicore32: restrict hw addr defines to source file 2017-12-18 17:07:02 +03:00
usb usb: Add basic code to emulate Chipidea USB IP 2018-02-09 10:40:30 +00:00
vfio VFIO updates 2019-03-11 2019-03-12 13:37:29 +00:00
virtio virtio-gpu: delay virglrenderer reset when blocked. 2019-03-18 13:10:57 +01:00
watchdog hw/arm/stellaris: Implement watchdog timer 2019-03-05 15:55:09 +00:00
xen pvh: Add x86/HVM direct boot ABI header file 2019-02-05 16:50:16 +01:00
xtensa target/xtensa: add MX interrupt controller 2019-01-28 11:55:20 -08:00
boards.h machine: Move nvdimms state into struct MachineState 2019-03-11 10:44:25 -03:00
bt.h char: rename CharDriverState Chardev 2017-01-27 18:07:59 +01:00
devices.h hw/devices: Remove unused TC6393XB_RAM definition 2019-03-07 22:16:22 +01:00
elf_ops.h elf-ops.h: Add get_elf_note_type() 2019-02-05 16:50:16 +01:00
empty_slot.h Clean up decorations and whitespace around header guards 2016-07-12 16:20:46 +02:00
fw-path-provider.h qom: make interface types abstract 2018-12-11 15:45:22 -02:00
hotplug.h qom: make interface types abstract 2018-12-11 15:45:22 -02:00
hw.h migration: Split qemu-file.h 2017-06-01 18:49:22 +02:00
ide.h ide/via: Rename functions to match device name 2019-01-25 14:52:12 -05:00
irq.h hw/core/split-irq: Device that splits IRQ lines 2018-03-02 11:03:45 +00:00
loader-fit.h Use #include "..." for our own headers, <...> for others 2018-02-09 05:05:11 +01:00
loader.h elf: Add optional function ptr to load_elf() to parse ELF notes 2019-02-05 16:50:16 +01:00
nmi.h qom: make interface types abstract 2018-12-11 15:45:22 -02:00
or-irq.h hw/core/or-irq: Support more than 16 inputs to an OR gate 2018-06-15 15:23:34 +01:00
pcmcia.h hw/pcmcia: Remove PCMCIACardState from "qemu/typedefs.h" 2019-01-22 05:14:32 +01:00
platform-bus.h platform-bus-device: use device plug callback instead of machine_done notifier 2018-05-10 18:10:56 +01:00
ptimer.h ptimer: Add TRIGGER_ONLY_ON_DECREMENT policy option 2018-07-09 14:51:34 +01:00
qdev-core.h qom: Move compat_props machinery from qdev to QOM 2019-03-11 22:53:44 +01:00
qdev-dma.h qdev: Remove hex8/32/64 property types 2014-02-14 21:12:04 +01:00
qdev-properties.h qapi: Define PCIe link speed and width properties 2018-12-19 16:48:16 -05:00
qdev.h hw: move headers to include/ 2013-04-08 18:13:10 +02:00
register.h hw: register: Run post_write hook on reset 2018-03-01 11:05:43 +00:00
registerfields.h Use #include "..." for our own headers, <...> for others 2018-02-09 05:05:11 +01:00
stream.h qom: make interface types abstract 2018-12-11 15:45:22 -02:00
sysbus.h core/sysbus: remove the SysBusDeviceClass::init path 2018-12-13 13:48:03 +00:00
usb.h qemu/queue.h: simplify reverse access to QTAILQ 2019-01-11 15:46:55 +01:00