Commit graph

818 commits

Author SHA1 Message Date
Avihai Horon
1f21670ec0 vfio/migration: Rename vfio_devices_all_dirty_tracking()
vfio_devices_all_dirty_tracking() is used to check if dirty page log
sync is needed. However, besides checking the dirty page tracking
status, it also checks the pre_copy_dirty_page_tracking flag.

Rename it to vfio_devices_log_sync_needed() which reflects its purpose
more accurately and makes the code clearer as there are already several
helpers with similar names.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20241218134022.21264-5-avihaih@nvidia.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26 07:23:38 +01:00
Avihai Horon
6e9df66e8a vfio/migration: Refactor vfio_devices_all_running_and_mig_active() logic
During DMA unmap with vIOMMU, vfio_devices_all_running_and_mig_active()
is used to check whether a dirty page log sync of the unmapped pages is
required. Such log sync is needed during migration pre-copy phase, and
the current logic detects it by checking if migration is active and if
the VFIO devices are running.

However, recently there has been an effort to simplify the migration
status API and reduce it to a single migration_is_running() function.

To accommodate this, refactor vfio_devices_all_running_and_mig_active()
logic so it won't use migration_is_active(). Do it by simply checking if
dirty tracking has been started using internal VFIO flags.

This should be equivalent to the previous logic as during migration
dirty tracking is active and when the guest is stopped there shouldn't
be DMA unmaps coming from it.

As a side effect, now that migration status is no longer used, DMA unmap
log syncs are untied from migration. This will make calc-dirty-rate more
accurate as now it will also include VFIO dirty pages that were DMA
unmapped.

Also rename the function to properly reflect its new logic and extract
common code from vfio_devices_all_dirty_tracking().

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20241218134022.21264-4-avihaih@nvidia.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26 07:23:38 +01:00
Avihai Horon
0ae05e087f vfio/migration: Refactor vfio_devices_all_dirty_tracking() logic
During dirty page log sync, vfio_devices_all_dirty_tracking() is used to
check if dirty tracking has been started in order to avoid errors. The
current logic checks if migration is in ACTIVE or DEVICE states to
ensure dirty tracking has been started.

However, recently there has been an effort to simplify the migration
status API and reduce it to a single migration_is_running() function.

To accommodate this, refactor vfio_devices_all_dirty_tracking() logic so
it won't use migration_is_active() and migration_is_device(). Instead,
use internal VFIO dirty tracking flags.

As a side effect, now that migration status is no longer used to detect
dirty tracking status, VFIO log syncs are untied from migration. This
will make calc-dirty-rate more accurate as now it will also include VFIO
dirty pages.

While at it, as VFIODevice->dirty_tracking is now used to detect dirty
tracking status, add a comment that states how it's protected.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20241218134022.21264-3-avihaih@nvidia.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26 07:23:38 +01:00
Avihai Horon
d77e85dbd7 vfio/container: Add dirty tracking started flag
Add a flag to VFIOContainerBase that indicates whether dirty tracking
has been started for the container or not.

This will be used in the following patches to allow dirty page syncs
only if dirty tracking has been started.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20241218134022.21264-2-avihaih@nvidia.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26 07:23:38 +01:00
Tomita Moeko
37f05a59e8 vfio/igd: add x-igd-gms option back to set DSM region size for guest
DSM region is likely to store framebuffer in Windows, a small DSM
region may cause display issues (e.g. half of the screen is black).
Since 971ca22f04 ("vfio/igd: don't set stolen memory size to zero"),
the x-igd-gms option was functionally removed, QEMU uses host's
original value, which is determined by DVMT Pre-Allocated option in
Intel FSP of host bios.

However, some vendors do not expose this config item to users. In
such cases, x-igd-gms option can be used to manually set the data
stolen memory size for guest. So this commit brings this option back,
keeping its old behavior. When it is not specified, QEMU uses host's
value.

When DVMT Pre-Allocated option is available in host BIOS, user should
set DSM region size there instead of using x-igd-gms option.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-11-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26 07:23:38 +01:00
Tomita Moeko
f926baa03b vfio/igd: emulate BDSM in mmio bar0 for gen 6-10 devices
A recent commit in i915 driver [1] claims the BDSM register at 0x1080c0
of mmio bar0 has been there since gen 6. Mirror this register to the 32
bit BDSM register at 0x5c in pci config space for gen6-10 devices.

[1] https://patchwork.freedesktop.org/patch/msgid/20240202224340.30647-7-ville.syrjala@linux.intel.com

Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-10-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26 07:23:37 +01:00
Tomita Moeko
ea652c2bee vfio/igd: emulate GGC register in mmio bar0
The GGC register at 0x50 of pci config space is a mirror of the same
register at 0x108040 of mmio bar0 [1]. i915 driver also reads that
register from mmio bar0 instead of config space. As GGC is programmed
and emulated by qemu, the mmio address should also be emulated, in the
same way of BDSM register.

[1] 4.1.28, 12th Generation Intel Core Processors Datasheet Volume 2
    https://www.intel.com/content/www/us/en/content-details/655259

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-9-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26 07:23:37 +01:00
Tomita Moeko
1a2623b5c9 vfio/igd: add macro for declaring mirrored registers
igd devices have multipe registers mirroring mmio address and pci
config space, more than a single BDSM register. To support this,
the read/write functions are made common and a macro is defined to
simplify the declaration of MemoryRegionOps.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-8-tomitamoeko@gmail.com
[ clg : Fixed conversion specifier on 32-bit platform ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26 07:23:37 +01:00
Tomita Moeko
960f62770a vfio/igd: add Alder/Raptor/Rocket/Ice/Jasper Lake device ids
All gen 11 and 12 igd devices have 64 bit BDSM register at 0xC0 in its
config space, add them to the list to support igd passthrough on Alder/
Raptor/Rocket/Ice/Jasper Lake platforms.

Tested legacy mode of igd passthrough works properly on both linux and
windows guests with AlderLake-S GT1 (8086:4680).

Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-7-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26 07:23:37 +01:00
Tomita Moeko
183714d8f9 vfio/igd: add Gemini Lake and Comet Lake device ids
Both Gemini Lake and Comet Lake are gen 9 devices. Many user reports
on internet shows legacy mode of igd passthrough works as qemu treats
them as gen 8 devices by default before e433f20897 ("vfio/igd:
return an invalid generation for unknown devices").

Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-6-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26 07:23:37 +01:00
Tomita Moeko
1e1eac5f3d vfio/igd: canonicalize memory size calculations
Add helper functions igd_gtt_memory_size() and igd_stolen_size() for
calculating GTT stolen memory and Data stolen memory size in bytes,
and use macros to replace the hardware-related magic numbers for
better readability.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-5-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26 07:23:37 +01:00
Tomita Moeko
0bb758e97a vfio/igd: align generation with i915 kernel driver
Define the igd device generations according to i915 kernel driver to
avoid confusion, and adjust comment placement to clearly reflect the
relationship between ids and devices.

The condition of how GTT stolen memory size is calculated is changed
accordingly as GGMS is in multiple of 2 starting from gen 8.

Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-4-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26 07:23:37 +01:00
Tomita Moeko
8492a6129c vfio/igd: remove unsupported device ids
Since e433f20897 ("vfio/igd: return an invalid generation for unknown
devices"), the default return of igd_gen() was changed to unsupported.
There is no need to filter out those unsupported devices.

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Link: https://lore.kernel.org/r/20241206122749.9893-3-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26 07:23:37 +01:00
Tomita Moeko
03828b00a2 vfio/igd: fix GTT stolen memory size calculation for gen 8+
On gen 8 and later devices, the GTT stolen memory size when GGMS equals
0 is 0 (no preallocated memory) rather than 1MB [1].

[1] 3.1.13, 5th Generation Intel Core Processor Family Datasheet Vol. 2
    https://www.intel.com/content/www/us/en/content-details/330835

Fixes: c4c45e943e ("vfio/pci: Intel graphics legacy mode assignment")

Reported-By: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20241206122749.9893-2-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26 07:23:37 +01:00
Stefan Hajnoczi
65cb7129f4 Accel & Exec patch queue
- Ignore writes to CNTP_CTL_EL0 on HVF ARM (Alexander)
 - Add '-d invalid_mem' logging option (Zoltan)
 - Create QOM containers explicitly (Peter)
 - Rename sysemu/ -> system/ (Philippe)
 - Re-orderning of include/exec/ headers (Philippe)
   Move a lot of declarations from these legacy mixed bag headers:
     . "exec/cpu-all.h"
     . "exec/cpu-common.h"
     . "exec/cpu-defs.h"
     . "exec/exec-all.h"
     . "exec/translate-all"
   to these more specific ones:
     . "exec/page-protection.h"
     . "exec/translation-block.h"
     . "user/cpu_loop.h"
     . "user/guest-host.h"
     . "user/page-protection.h"
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmdlnyAACgkQ4+MsLN6t
 wN6mBw//QFWi7CrU+bb8KMM53kOU9C507tjn99LLGFb5or73/umDsw6eo/b8DHBt
 KIwGLgATel42oojKfNKavtAzLK5rOrywpboPDpa3SNeF1onW+99NGJ52LQUqIX6K
 A6bS0fPdGG9ZzEuPpbjDXlp++0yhDcdSgZsS42fEsT7Dyj5gzJYlqpqhiXGqpsn8
 4Y0UMxSL21K3HEexlzw2hsoOBFA3tUm2ujNDhNkt8QASr85yQVLCypABJnuoe///
 5Ojl5wTBeDwhANET0rhwHK8eIYaNboiM9fHopJYhvyw1bz6yAu9jQwzF/MrL3s/r
 xa4OBHBy5mq2hQV9Shcl3UfCQdk/vDaYaWpgzJGX8stgMGYfnfej1SIl8haJIfcl
 VMX8/jEFdYbjhO4AeGRYcBzWjEJymkDJZoiSWp2NuEDi6jqIW+7yW1q0Rnlg9lay
 ShAqLK5Pv4zUw3t0Jy3qv9KSW8sbs6PQxtzXjk8p97rTf76BJ2pF8sv1tVzmsidP
 9L92Hv5O34IqzBu2oATOUZYJk89YGmTIUSLkpT7asJZpBLwNM2qLp5jO00WVU0Sd
 +kAn324guYPkko/TVnjC/AY7CMu55EOtD9NU35k3mUAnxXT9oDUeL4NlYtfgrJx6
 x1Nzr2FkS68+wlPAFKNSSU5lTjsjNaFM0bIJ4LCNtenJVP+SnRo=
 =cjz8
 -----END PGP SIGNATURE-----

Merge tag 'exec-20241220' of https://github.com/philmd/qemu into staging

Accel & Exec patch queue

- Ignore writes to CNTP_CTL_EL0 on HVF ARM (Alexander)
- Add '-d invalid_mem' logging option (Zoltan)
- Create QOM containers explicitly (Peter)
- Rename sysemu/ -> system/ (Philippe)
- Re-orderning of include/exec/ headers (Philippe)
  Move a lot of declarations from these legacy mixed bag headers:
    . "exec/cpu-all.h"
    . "exec/cpu-common.h"
    . "exec/cpu-defs.h"
    . "exec/exec-all.h"
    . "exec/translate-all"
  to these more specific ones:
    . "exec/page-protection.h"
    . "exec/translation-block.h"
    . "user/cpu_loop.h"
    . "user/guest-host.h"
    . "user/page-protection.h"

 # -----BEGIN PGP SIGNATURE-----
 #
 # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmdlnyAACgkQ4+MsLN6t
 # wN6mBw//QFWi7CrU+bb8KMM53kOU9C507tjn99LLGFb5or73/umDsw6eo/b8DHBt
 # KIwGLgATel42oojKfNKavtAzLK5rOrywpboPDpa3SNeF1onW+99NGJ52LQUqIX6K
 # A6bS0fPdGG9ZzEuPpbjDXlp++0yhDcdSgZsS42fEsT7Dyj5gzJYlqpqhiXGqpsn8
 # 4Y0UMxSL21K3HEexlzw2hsoOBFA3tUm2ujNDhNkt8QASr85yQVLCypABJnuoe///
 # 5Ojl5wTBeDwhANET0rhwHK8eIYaNboiM9fHopJYhvyw1bz6yAu9jQwzF/MrL3s/r
 # xa4OBHBy5mq2hQV9Shcl3UfCQdk/vDaYaWpgzJGX8stgMGYfnfej1SIl8haJIfcl
 # VMX8/jEFdYbjhO4AeGRYcBzWjEJymkDJZoiSWp2NuEDi6jqIW+7yW1q0Rnlg9lay
 # ShAqLK5Pv4zUw3t0Jy3qv9KSW8sbs6PQxtzXjk8p97rTf76BJ2pF8sv1tVzmsidP
 # 9L92Hv5O34IqzBu2oATOUZYJk89YGmTIUSLkpT7asJZpBLwNM2qLp5jO00WVU0Sd
 # +kAn324guYPkko/TVnjC/AY7CMu55EOtD9NU35k3mUAnxXT9oDUeL4NlYtfgrJx6
 # x1Nzr2FkS68+wlPAFKNSSU5lTjsjNaFM0bIJ4LCNtenJVP+SnRo=
 # =cjz8
 # -----END PGP SIGNATURE-----
 # gpg: Signature made Fri 20 Dec 2024 11:45:20 EST
 # gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
 # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [unknown]
 # gpg: WARNING: This key is not certified with a trusted signature!
 # gpg:          There is no indication that the signature belongs to the owner.
 # Primary key fingerprint: FAAB E75E 1291 7221 DCFD  6BB2 E3E3 2C2C DEAD C0DE

* tag 'exec-20241220' of https://github.com/philmd/qemu: (59 commits)
  util/qemu-timer: fix indentation
  meson: Do not define CONFIG_DEVICES on user emulation
  system/accel-ops: Remove unnecessary 'exec/cpu-common.h' header
  system/numa: Remove unnecessary 'exec/cpu-common.h' header
  hw/xen: Remove unnecessary 'exec/cpu-common.h' header
  target/mips: Drop left-over comment about Jazz machine
  target/mips: Remove tswap() calls in semihosting uhi_fstat_cb()
  target/xtensa: Remove tswap() calls in semihosting simcall() helper
  accel/tcg: Un-inline translator_is_same_page()
  accel/tcg: Include missing 'exec/translation-block.h' header
  accel/tcg: Move tcg_cflags_has/set() to 'exec/translation-block.h'
  accel/tcg: Restrict curr_cflags() declaration to 'internal-common.h'
  qemu/coroutine: Include missing 'qemu/atomic.h' header
  exec/translation-block: Include missing 'qemu/atomic.h' header
  accel/tcg: Declare cpu_loop_exit_requested() in 'exec/cpu-common.h'
  exec/cpu-all: Include 'cpu.h' earlier so MMU_USER_IDX is always defined
  target/sparc: Move sparc_restore_state_to_opc() to cpu.c
  target/sparc: Uninline cpu_get_tb_cpu_state()
  target/loongarch: Declare loongarch_cpu_dump_state() locally
  user: Move various declarations out of 'exec/exec-all.h'
  ...

Conflicts:
	hw/char/riscv_htif.c
	hw/intc/riscv_aplic.c
	target/s390x/cpu.c

	Apply sysemu header path changes to not in the pull request.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-12-21 11:07:00 -05:00
Philippe Mathieu-Daudé
32cad1ffb8 include: Rename sysemu/ -> system/
Headers in include/sysemu/ are not only related to system
*emulation*, they are also used by virtualization. Rename
as system/ which is clearer.

Files renamed manually then mechanical change using sed tool.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20241203172445.28576-1-philmd@linaro.org>
2024-12-20 17:44:56 +01:00
Richard Henderson
b1987a2547 Constify all opaque Property pointers
Via sed "s/  Property [*]/  const Property */".

The opaque pointers passed to ObjectProperty callbacks are
the last instances of non-const Property pointers in the tree.
For the most part, these callbacks only use object_field_prop_ptr,
which now takes a const pointer itself.

This logically should have accompanied d36f165d95 which
allowed const Property to be registered.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://lore.kernel.org/r/20241218134251.4724-25-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-19 19:36:37 +01:00
Richard Henderson
5fcabe628b include/hw/qdev-properties: Remove DEFINE_PROP_END_OF_LIST
Now that all of the Property arrays are counted, we can remove
the terminator object from each array.  Update the assertions
in device_class_set_props to match.

With struct Property being 88 bytes, this was a rather large
form of terminator.  Saves 30k from qemu-system-aarch64.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://lore.kernel.org/r/20241218134251.4724-21-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-19 19:36:37 +01:00
Richard Henderson
d3eb6f73e1 hw/vfio: Constify all Property
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-12-15 12:56:37 -06:00
Peter Maydell
2c471a8291 * Fixes & doc updates for the new "boot order" s390x bios feature
* Provide a "loadparm" property for scsi-hd & scsi-cd devices on s390x
   (required for the "boot order" feature)
 * Fix the floating-point multiply-and-add NaN rules on s390x
 * Raise timeout on cross-accel build jobs to 60m
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmc7ercRHHRodXRoQHJl
 ZGhhdC5jb20ACgkQLtnXdP5wLbVjyg//ZuhSDCj+oBSU6vwM7Lwh3CS6GwZvGECU
 h60V3tizKypiRNtTJRXHoWcx95brXmoZgI+QQhDEXe3fFLkOEKT6AIlDhrKZRUsd
 rpLPr6O8TVKO+rSE7JVJAP3X1tpOOQDxnq83uWBv53b0S+Da0VwDRtI9gcugRMmh
 d58P8Q1bV344fQdcrebejstpSUG7RxSA4Plj2uSQx4mSHT7cy/hN+vA34Ha7reE3
 tcN9yfQq3Rmfvt0MV5I9Umd6JXEoDlEAwjSNsWRsCzo69jBZwiMtXSH8LyLtwRTp
 C919G/MIRuhvImF74dStLVCr82sNq54YR1NP6CGcmqPH76FOH8Mx3vmx9Cxj9ckA
 6NI6SvIg++bW2O1efG2apz8p5fjbDzYXSAbHnaWTcEu3gPgH4PQ5QXoyKaDymvWV
 JIh5/gXEy+twEXgIBsdWQ44A9E06lL/tNfKnqGdXK4ZYF2JIrI+Lq7AKBee7tebP
 +72I4PljHLSHQ3GxdkoOeJ8ahu70IBdSz2/VEIwOWK1wIf5C5WFNBerLJyDmkyx8
 xIvIm0vlRLwPcuOC711nlaMaKqTNT+8W4DIqIY6fHs2Jy0psMdgey1uHQxYEj9Kh
 fg7CvalK8n3MkGAwTqAvRJIwMFe0a4Ss6c6CaemSaYa38ud/pCNnv+IT+Eqr+mjq
 6y5PZWNrZi0=
 =UaDH
 -----END PGP SIGNATURE-----

Merge tag 'pull-request-2024-11-18' of https://gitlab.com/thuth/qemu into staging

* Fixes & doc updates for the new "boot order" s390x bios feature
* Provide a "loadparm" property for scsi-hd & scsi-cd devices on s390x
  (required for the "boot order" feature)
* Fix the floating-point multiply-and-add NaN rules on s390x
* Raise timeout on cross-accel build jobs to 60m

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmc7ercRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbVjyg//ZuhSDCj+oBSU6vwM7Lwh3CS6GwZvGECU
# h60V3tizKypiRNtTJRXHoWcx95brXmoZgI+QQhDEXe3fFLkOEKT6AIlDhrKZRUsd
# rpLPr6O8TVKO+rSE7JVJAP3X1tpOOQDxnq83uWBv53b0S+Da0VwDRtI9gcugRMmh
# d58P8Q1bV344fQdcrebejstpSUG7RxSA4Plj2uSQx4mSHT7cy/hN+vA34Ha7reE3
# tcN9yfQq3Rmfvt0MV5I9Umd6JXEoDlEAwjSNsWRsCzo69jBZwiMtXSH8LyLtwRTp
# C919G/MIRuhvImF74dStLVCr82sNq54YR1NP6CGcmqPH76FOH8Mx3vmx9Cxj9ckA
# 6NI6SvIg++bW2O1efG2apz8p5fjbDzYXSAbHnaWTcEu3gPgH4PQ5QXoyKaDymvWV
# JIh5/gXEy+twEXgIBsdWQ44A9E06lL/tNfKnqGdXK4ZYF2JIrI+Lq7AKBee7tebP
# +72I4PljHLSHQ3GxdkoOeJ8ahu70IBdSz2/VEIwOWK1wIf5C5WFNBerLJyDmkyx8
# xIvIm0vlRLwPcuOC711nlaMaKqTNT+8W4DIqIY6fHs2Jy0psMdgey1uHQxYEj9Kh
# fg7CvalK8n3MkGAwTqAvRJIwMFe0a4Ss6c6CaemSaYa38ud/pCNnv+IT+Eqr+mjq
# 6y5PZWNrZi0=
# =UaDH
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 18 Nov 2024 17:34:47 GMT
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* tag 'pull-request-2024-11-18' of https://gitlab.com/thuth/qemu:
  .gitlab-ci.d: Raise timeout on cross-accel build jobs to 60m
  pc-bios: Update the s390 bios images with the recent fixes
  pc-bios/s390-ccw: Re-initialize receive queue index before each boot attempt
  pc-bios/s390x: Initialize machine loadparm before probing IPL devices
  pc-bios/s390x: Initialize cdrom type to false for each IPL device
  hw: Add "loadparm" property to scsi disk devices for booting on s390x
  hw/s390x: Restrict "loadparm" property to devices that can be used for booting
  docs/system/bootindex: Make it clear that s390x can also boot from virtio-net
  docs/system/s390x/bootdevices: Update loadparm documentation
  tests/tcg/s390x: Add the floating-point multiply-and-add test
  target/s390x: Fix the floating-point multiply-and-add NaN rules
  hw/usb: Use __attribute__((packed)) vs __packed

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-11-18 20:23:59 +00:00
Thomas Huth
6e7c96ae61 hw/s390x: Restrict "loadparm" property to devices that can be used for booting
Commit bb185de423 ("s390x: Add individual loadparm assignment to
CCW device") added a "loadparm" property to all CCW devices. This
was a little bit unfortunate, since this property is only useful
for devices that can be used for booting, but certainly it is not
useful for devices like virtio-gpu or virtio-tablet.

Thus let's restrict the property to CCW devices that we can boot from
(i.e. virtio-block, virtio-net and vfio-ccw devices).

Message-ID: <20241113114741.681096-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Jared Rossi <jrossi@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-11-18 10:54:42 +01:00
Cédric Le Goater
ebbf7c60bb vfio/container: Fix container object destruction
When commit 96b7af4388 intoduced a .instance_finalize() handler,
it did not take into account that the container was not necessarily
inserted into the container list of the address space. Hence, if
the container object is destroyed, by calling object_unref() for
example, before vfio_address_space_insert() is called, QEMU may
crash when removing the container from the list as done in
vfio_container_instance_finalize(). This was seen with an SEV-SNP
guest for which discarding of RAM fails.

To resolve this issue, use the safe version of QLIST_REMOVE().

Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Fixes: 96b7af4388 ("vfio/container: Move vfio_container_destroy() to an instance_finalize() handler")
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-11-18 08:40:06 +01:00
Corvin Köhne
66650fd0cc vfio/igd: fix calculation of graphics stolen memory
When copying the calculation of the stolen memory size for Intels integrated
graphics device of gen 9 and later from the Linux kernel [1], we missed
subtracting 0xf0 from the graphics mode select value for values above 0xf0.
This leads to QEMU reporting a very large size of the graphics stolen memory
area. That's just a waste of memory. Additionally the guest firmware might be
unable to allocate such a large buffer.

[1] 7c626ce4ba/arch/x86/kernel/early-quirks.c (L455-L460)

Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Fixes: 8719224166 ("vfio/igd: correctly calculate stolen memory size for gen 9 and later")
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
[ clg: Changed commit subject ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-11-18 08:40:06 +01:00
Corvin Köhne
aff0c39c5b vfio/igd: add pci id for Coffee Lake
I've tested and verified that Coffee Lake devices are working properly.

Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-11-18 08:40:06 +01:00
Maciej S. Szmigiero
89b5161527 vfio/migration: Add vfio_save_block_precopy_empty_hit trace event
This way it is clearly known when there's no more data to send for that
device.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2024-11-05 15:51:14 +01:00
Maciej S. Szmigiero
2564e45171 vfio/migration: Add save_{iterate, complete_precopy}_start trace events
This way both the start and end points of migrating a particular VFIO
device are known.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2024-11-05 15:51:14 +01:00
Peter Xu
f018eb62b2 migration: Drop migration_is_setup_or_active()
This helper is mostly the same as migration_is_running(), except that one
has COLO reported as true, the other has CANCELLING reported as true.

Per my past years experience on the state changes, none of them should
matter.

To make it slightly safer, report both COLO || CANCELLING to be true in
migration_is_running(), then drop the other one.  We kept the 1st only
because the name is simpler, and clear enough.

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31 15:48:18 -04:00
Alex Williamson
00b519c0bc vfio/helpers: Align mmaps
Thanks to work by Peter Xu, support is introduced in Linux v6.12 to
allow pfnmap insertions at PMD and PUD levels of the page table.  This
means that provided a properly aligned mmap, the vfio driver is able
to map MMIO at significantly larger intervals than PAGE_SIZE.  For
example on x86_64 (the only architecture currently supporting huge
pfnmaps for PUD), rather than 4KiB mappings, we can map device MMIO
using 2MiB and even 1GiB page table entries.

Typically mmap will already provide PMD aligned mappings, so devices
with moderately sized MMIO ranges, even GPUs with standard 256MiB BARs,
will already take advantage of this support.  However in order to better
support devices exposing multi-GiB MMIO, such as 3D accelerators or GPUs
with resizable BARs enabled, we need to manually align the mmap.

There doesn't seem to be a way for userspace to easily learn about PMD
and PUD mapping level sizes, therefore this takes the simple approach
to align the mapping to the power-of-two size of the region, up to 1GiB,
which is currently the maximum alignment we care about.

Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
2024-10-23 14:46:24 +02:00
Alex Williamson
49915c0d2c vfio/helpers: Refactor vfio_region_mmap() error handling
Move error handling code to the end of the function so that it can more
easily be shared by new mmap failure conditions.  No functional change
intended.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
2024-10-23 14:46:24 +02:00
Avihai Horon
fa4e20defe vfio/migration: Change trace formats from hex to decimal
Data sizes in VFIO migration trace events are printed in hex format
while in migration core trace events they are printed in decimal format.

This inconsistency makes it less readable when using both trace event
types. Hence, change the data sizes print format to decimal in VFIO
migration trace events.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
2024-10-23 14:46:24 +02:00
Avihai Horon
3b5948f808 vfio/migration: Report only stop-copy size in vfio_state_pending_exact()
vfio_state_pending_exact() is used to update migration core how much
device data is left for the device migration. Currently, the sum of
pre-copy and stop-copy sizes of the VFIO device are reported.

The pre-copy size is obtained via the VFIO_MIG_GET_PRECOPY_INFO ioctl,
which returns the amount of device data available to be transferred
while the device is in the PRE_COPY states.

The stop-copy size is obtained via the VFIO_DEVICE_FEATURE_MIG_DATA_SIZE
ioctl, which returns the total amount of device data left to be
transferred in order to complete the device migration.

According to the above, current implementation is wrong -- it reports
extra overlapping data because pre-copy size is already contained in
stop-copy size. Fix it by reporting only stop-copy size.

Fixes: eda7362af9 ("vfio/migration: Add VFIO migration pre-copy support")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
2024-10-23 14:46:24 +02:00
Corvin Köhne
8719224166 vfio/igd: correctly calculate stolen memory size for gen 9 and later
We have to update the calculation of the stolen memory size because
we've seen devices using values of 0xf0 and above for the graphics mode
select field. The new calculation was taken from the linux kernel [1].

[1] 7c626ce4ba/arch/x86/kernel/early-quirks.c (L455-L460)

Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-17 10:37:55 +02:00
Corvin Köhne
971ca22f04 vfio/igd: don't set stolen memory size to zero
The stolen memory is required for the GOP (EFI) driver and the Windows
driver. While the GOP driver seems to work with any stolen memory size,
the Windows driver will crash if the size doesn't match the size
allocated by the host BIOS. For that reason, it doesn't make sense to
overwrite the stolen memory size. It's true that this wastes some VM
memory. In the worst case, the stolen memory can take up more than a GB.
However, that's uncommon. Additionally, it's likely that a bunch of RAM
is assigned to VMs making use of GPU passthrough.

Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-17 10:37:55 +02:00
Corvin Köhne
9c86b9fb43 vfio/igd: add ID's for ElkhartLake and TigerLake
ElkhartLake and TigerLake devices were tested in legacy mode with Linux
and Windows VMs. Both are working properly. It's likely that other Intel
GPUs of gen 11 and 12 like IceLake device are working too. However,
we're only adding known good devices for now.

Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-17 10:37:55 +02:00
Corvin Köhne
11b5ce95be vfio/igd: add new bar0 quirk to emulate BDSM mirror
The BDSM register is mirrored into MMIO space at least for gen 11 and
later devices. Unfortunately, the Windows driver reads the register
value from MMIO space instead of PCI config space for those devices [1].
Therefore, we either have to keep a 1:1 mapping for the host and guest
address or we have to emulate the MMIO register too. Using the igd in
legacy mode is already hard due to it's many constraints. Keeping a 1:1
mapping may not work in all cases and makes it even harder to use. An
MMIO emulation has to trap the whole MMIO page. This makes accesses to
this page slower compared to using second level address translation.
Nevertheless, it doesn't have any constraints and I haven't noticed any
performance degradation yet making it a better solution.

[1] 5c351bee0f/devicemodel/hw/pci/passthrough.c (L650-L653)

Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-17 10:37:55 +02:00
Corvin Köhne
7bafcd1767 vfio/igd: use new BDSM register location and size for gen 11 and later
Intel changed the location and size of the BDSM register for gen 11
devices and later. We have to adjust our emulation for these devices to
properly support them.

Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-17 10:37:55 +02:00
Corvin Köhne
abd9dda914 vfio/igd: support legacy mode for all known generations
We're soon going to add support for legacy mode to ElkhartLake and
TigerLake devices. Those are gen 11 and 12 devices. At the moment, all
devices identified by our igd_gen function do support legacy mode. This
won't change when adding our new devices of gen 11 and 12. Therefore, it
makes more sense to accept legacy mode for all known devices instead of
maintaining a long list of known good generations. If we add a new
generation to igd_gen which doesn't support legacy mode for some reason,
it'll be easy to advance the check to reject legacy mode for this
specific generation.

Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-17 10:37:55 +02:00
Corvin Köhne
e433f20897 vfio/igd: return an invalid generation for unknown devices
Intel changes it's specification quite often e.g. the location and size
of the BDSM register has change for gen 11 devices and later. This
causes our emulation to fail on those devices. So, it's impossible for
us to use a suitable default value for unknown devices. Instead of
returning a random generation value and hoping that everthing works
fine, we should verify that different devices are working and add them
to our list of known devices.

Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-17 10:37:55 +02:00
Peter Maydell
4bd683d6f1 hw/vfio/pci.c: Use correct type in trace_vfio_msix_early_setup()
The tracepoint trace_vfio_msix_early_setup() uses "int" for the type
of the table_bar argument, but we use this to print a uint32_t.
Coverity warns that this means that we could end up treating it as a
negative number.

We only use this in printing the value in the tracepoint, so
mishandling it as a negative number would be harmless, but it's
better to use the right type in the tracepoint.  Use uint64_t to
match how we print the table_offset in the vfio_msix_relo()
tracepoint.

Resolves: Coverity CID 1547690
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
2024-09-17 10:37:55 +02:00
Peter Maydell
e3d0814368 hw: Use device_class_set_legacy_reset() instead of opencoding
Use device_class_set_legacy_reset() instead of opencoding an
assignment to DeviceClass::reset. This change was produced
with:
 spatch --macro-file scripts/cocci-macro-file.h \
    --sp-file scripts/coccinelle/device-reset.cocci \
    --keep-comments --smpl-spacing --in-place --dir hw

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240830145812.1967042-8-peter.maydell@linaro.org
2024-09-13 15:31:44 +01:00
Markus Armbruster
64f5e9db77 qapi/vfio: Rename VfioMigrationState to Qapi*, and drop prefix
QAPI's 'prefix' feature can make the connection between enumeration
type and its constants less than obvious.  It's best used with
restraint.

VfioMigrationState has a 'prefix' that overrides the generated
enumeration constants' prefix to QAPI_VFIO_MIGRATION_STATE.

We could simply drop 'prefix', but then the enumeration constants
would look as if they came from kernel header linux/vfio.h.

Rename the type to QapiVfioMigrationState instead, so that 'prefix' is
not needed.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240904111836.3273842-20-armbru@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
2024-09-10 14:03:45 +02:00
Markus Armbruster
55872c7061 qapi/common: Drop temporary 'prefix'
Recent commit "qapi: Smarter camel_to_upper() to reduce need for
'prefix'" added a temporary 'prefix' to delay changing the generated
code.

Revert it.  This improves OffAutoPCIBAR's generated enumeration
constant prefix from OFF_AUTOPCIBAR to OFF_AUTO_PCIBAR.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Message-ID: <20240904111836.3273842-5-armbru@redhat.com>
2024-09-10 13:22:47 +02:00
Richard Henderson
dd4bc5f1cf vfio queue:
* IOMMUFD Dirty Tracking support
 * Fix for a possible SEGV in IOMMU type1 container
 * Dropped initialization of host IOMMU device with mdev devices
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmafyVUACgkQUaNDx8/7
 7KGebRAAzEYxvstDxSPNF+1xx937TKbRpiKYtspTfEgu4Ht50MwO2ZqnVWzTBSwa
 qcjhDf2avMBpBvkp4O9fR7nXR0HRN2KvYrBSThZ3Qpqu4KjxCAGcHI5uYmgfizYh
 BBLrw3eWME5Ry220TinQF5KFl50vGq7Z/mku5N5Tgj2qfTfCXYK1Kc19SyAga49n
 LSokTIjZAGJa4vxrE7THawaEUjFRjfCJey64JUs/TPJaGr4R1snJcWgETww6juUE
 9OSw/xl0AoQhaN/ZTRC1qCsBLUI2MVPsC+x+vqVK62HlTjCx+uDRVQ8KzfDzjCeH
 gaLkMjxJSuJZMpm4UU7DBzDGEGcEBCGeNyFt37BSqqPPpX55CcFhj++d8vqTiwpF
 YzmTNd/znxcZTw6OJN9sQZohh+NeS86CVZ3x31HD3dXifhRf17jbh7NoIyi+0ZCb
 N+mytOH5BXsD+ddwbk+yMaxXV43Fgz7ThG5tB1tjhhNtLZHDA5ezFvGZ5F/FJrqE
 xAbjOhz5MC+RcOVNSzQJCULNqFpfE6Gqeys6btEDm/ltf4LpAe6W1HYuv8BJc19T
 UsqGK2yKAuQX8GErYxJ1zqZCttVrgpsmXFYTC5iGbxC84mvsF0Iti96IdXz9gfzN
 Vlb2OxoefcOwVqIhbkvTZW0ZwYGGDDPAYhLMfr5lSuRqj123OOo=
 =cViP
 -----END PGP SIGNATURE-----

Merge tag 'pull-vfio-20240723-1' of https://github.com/legoater/qemu into staging

vfio queue:

* IOMMUFD Dirty Tracking support
* Fix for a possible SEGV in IOMMU type1 container
* Dropped initialization of host IOMMU device with mdev devices

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmafyVUACgkQUaNDx8/7
# 7KGebRAAzEYxvstDxSPNF+1xx937TKbRpiKYtspTfEgu4Ht50MwO2ZqnVWzTBSwa
# qcjhDf2avMBpBvkp4O9fR7nXR0HRN2KvYrBSThZ3Qpqu4KjxCAGcHI5uYmgfizYh
# BBLrw3eWME5Ry220TinQF5KFl50vGq7Z/mku5N5Tgj2qfTfCXYK1Kc19SyAga49n
# LSokTIjZAGJa4vxrE7THawaEUjFRjfCJey64JUs/TPJaGr4R1snJcWgETww6juUE
# 9OSw/xl0AoQhaN/ZTRC1qCsBLUI2MVPsC+x+vqVK62HlTjCx+uDRVQ8KzfDzjCeH
# gaLkMjxJSuJZMpm4UU7DBzDGEGcEBCGeNyFt37BSqqPPpX55CcFhj++d8vqTiwpF
# YzmTNd/znxcZTw6OJN9sQZohh+NeS86CVZ3x31HD3dXifhRf17jbh7NoIyi+0ZCb
# N+mytOH5BXsD+ddwbk+yMaxXV43Fgz7ThG5tB1tjhhNtLZHDA5ezFvGZ5F/FJrqE
# xAbjOhz5MC+RcOVNSzQJCULNqFpfE6Gqeys6btEDm/ltf4LpAe6W1HYuv8BJc19T
# UsqGK2yKAuQX8GErYxJ1zqZCttVrgpsmXFYTC5iGbxC84mvsF0Iti96IdXz9gfzN
# Vlb2OxoefcOwVqIhbkvTZW0ZwYGGDDPAYhLMfr5lSuRqj123OOo=
# =cViP
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 24 Jul 2024 01:16:37 AM AEST
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* tag 'pull-vfio-20240723-1' of https://github.com/legoater/qemu:
  vfio/common: Allow disabling device dirty page tracking
  vfio/migration: Don't block migration device dirty tracking is unsupported
  vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
  vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
  vfio/iommufd: Probe and request hwpt dirty tracking capability
  vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during attach_device()
  vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
  vfio/{iommufd,container}: Remove caps::aw_bits
  vfio/iommufd: Introduce auto domain creation
  vfio/ccw: Don't initialize HOST_IOMMU_DEVICE with mdev
  vfio/ap: Don't initialize HOST_IOMMU_DEVICE with mdev
  vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
  backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities
  vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev
  vfio/pci: Extract mdev check into an helper
  hw/vfio/container: Fix SIGSEV on vfio_container_instance_finalize()

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-24 12:58:46 +10:00
Joao Martins
30b9167785 vfio/common: Allow disabling device dirty page tracking
The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole
tracking of VF pre-copy phase of dirty page tracking, though it means
that it will only be used at the start of the switchover phase.

Add an option that disables the VF dirty page tracking, and fall
back into container-based dirty page tracking. This also allows to
use IOMMU dirty tracking even on VFs with their own dirty
tracker scheme.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23 17:14:53 +02:00
Joao Martins
f48b472450 vfio/migration: Don't block migration device dirty tracking is unsupported
By default VFIO migration is set to auto, which will support live
migration if the migration capability is set *and* also dirty page
tracking is supported.

For testing purposes one can force enable without dirty page tracking
via enable-migration=on, but that option is generally left for testing
purposes.

So starting with IOMMU dirty tracking it can use to accommodate the lack of
VF dirty page tracking allowing us to minimize the VF requirements for
migration and thus enabling migration by default for those too.

While at it change the error messages to mention IOMMU dirty tracking as
well.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
[ clg: - spelling in commit log ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-07-23 17:14:53 +02:00
Joao Martins
7c30710bd9 vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
that fetches the bitmap that tells what was dirty in an IOVA
range.

A single bitmap is allocated and used across all the hwpts
sharing an IOAS which is then used in log_sync() to set Qemu
global bitmaps.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23 17:14:52 +02:00
Joao Martins
52ce88229c vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
enables or disables dirty page tracking. The ioctl is used if the hwpt
has been created with dirty tracking supported domain (stored in
hwpt::flags) and it is called on the whole list of iommu domains.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Joao Martins
dddfd8d667 vfio/iommufd: Probe and request hwpt dirty tracking capability
In preparation to using the dirty tracking UAPI, probe whether the IOMMU
supports dirty tracking. This is done via the data stored in
hiod::caps::hw_caps initialized from GET_HW_INFO.

Qemu doesn't know if VF dirty tracking is supported when allocating
hardware pagetable in iommufd_cdev_autodomains_get(). This is because
VFIODevice migration state hasn't been initialized *yet* hence it can't pick
between VF dirty tracking vs IOMMU dirty tracking. So, if IOMMU supports
dirty tracking it always creates HWPTs with IOMMU_HWPT_ALLOC_DIRTY_TRACKING
even if later on VFIOMigration decides to use VF dirty tracking instead.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[ clg: - Fixed vbasedev->iommu_dirty_tracking assignment in
         iommufd_cdev_autodomains_get()
       - Added warning for heterogeneous dirty page tracking support
	 in iommufd_cdev_autodomains_get() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23 17:14:52 +02:00
Joao Martins
83a4d596a9 vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during attach_device()
Move the HostIOMMUDevice::realize() to be invoked during the attach of the device
before we allocate IOMMUFD hardware pagetable objects (HWPT). This allows the use
of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell if the IOMMU
behind the device supports dirty tracking.

Note: The HostIOMMUDevice data from legacy backend is static and doesn't
need any information from the (type1-iommu) backend to be initialized.
In contrast however, the IOMMUFD HostIOMMUDevice data requires the
iommufd FD to be connected and having a devid to be able to successfully
GET_HW_INFO. This means vfio_device_hiod_realize() is called in
different places within the backend .attach_device() implementation.

Suggested-by: Cédric Le Goater <clg@redhat.cm>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
[ clg: Fixed error handling in iommufd_cdev_attach() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Joao Martins
21e8d3a3aa vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
Store the value of @caps returned by iommufd_backend_get_device_info()
in a new field HostIOMMUDeviceCaps::hw_caps. Right now the only value is
whether device IOMMU supports dirty tracking (IOMMU_HW_CAP_DIRTY_TRACKING).

This is in preparation for HostIOMMUDevice::realize() being called early
during attach_device().

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00