Commit graph

72 commits

Author SHA1 Message Date
Philippe Mathieu-Daudé
2cd09e47aa qom: Make InterfaceInfo[] uses const
Mechanical change using:

  $ sed -i -E 's/\(InterfaceInfo.?\[/\(const InterfaceInfo\[/g' \
              $(git grep -lE '\(InterfaceInfo.?\[\]\)')

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250424194905.82506-7-philmd@linaro.org>
2025-04-25 17:00:41 +02:00
Philippe Mathieu-Daudé
12d1a768bd qom: Have class_init() take a const data argument
Mechanical change using gsed, then style manually adapted
to pass checkpatch.pl script.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250424194905.82506-4-philmd@linaro.org>
2025-04-25 17:00:41 +02:00
Richard Henderson
4705a71db5 include/system: Move exec/ram_addr.h to system/ram_addr.h
Convert the existing includes with sed.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2025-04-23 14:08:24 -07:00
Philippe Mathieu-Daudé
f410d702e4 hw/virtio/virtio-mem: Remove CONFIG_DEVICES include
Rather than checking ACPI availability at compile time by
checking the CONFIG_ACPI definition from CONFIG_DEVICES,
check at runtime via acpi_builtin().

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20250307223949.54040-5-philmd@linaro.org>
2025-03-11 20:03:26 +01:00
Daniil Tatianin
cd2e472e54 system: introduce a new MlockState enum
Replace the boolean value enable_mlock with an enum and add a helper to
decide whether we should be calling os_mlock.

This is a stepping stone towards introducing a new mlock mode, which
will be the third possible state of this enum.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Link: https://lore.kernel.org/r/20250212143920.1269754-4-d-tatianin@yandex-team.ru
Signed-off-by: Peter Xu <peterx@redhat.com>
2025-02-12 11:36:13 -05:00
Stefan Hajnoczi
aa3a285b5b Hi,
"Host Memory Backends" and "Memory devices" queue ("mem"):
 - Fixup handling of virtio-mem unplug during system resets, as
   preparation for s390x support (especially kdump in the Linux guest)
 - virtio-mem support for s390x
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmdnFD4RHGRhdmlkQHJl
 ZGhhdC5jb20ACgkQTd4Q9wD/g1rWBBAAp7WkYaNAjRy1PgpjNZ3z1gUJc/vk+skJ
 xVgGodA8txrJOFpNrbTyfhrdLs2TV4oWDvB/zrZRRtuxvur3O1EhFd9k6EqXuydr
 0FunvLvVJwRHfEZycjN4aacQMRH3CJw07OaTzexeSl5UR/6w5PRofwUK4HX7W/Ka
 arqomGa3OJrs1+WgkV0Qcn4vh9HLRVv3iNC2Xo4W1wOCr1Du9zSPn9oC7zOQ0EO4
 ZC//7QsdkNRjUX/yMXMkhlSXx3b/RmRg2DBrxo7BZXg27VwGu4uHxL4LRBZiB2A7
 V9MqFOcVKzPMkXKTRjrgZ0vXQx1MPJ6WprEihMzMpYU6DrpA7KN/l8Ca8H24B2ln
 h7+bmkDsHVVcWovE9ii/9cMRfws6uWXXg3KoA8RQ8IbX1tU02lblw2uHhXEzcoge
 npqp/Z5LAiKVMetEnNnLH5thjut5PAEjuqD00cmZAMy4DNngLX2bGSdzMeVBkDMa
 78ehLGRplm3t7ibUfaZaMKe6UD9tFrcD6XKsvUTXXHNbYO8ynbx58WOxSZmY98zU
 n3JNQRqtXYjBVlH3Dqm47vOTZHgOzFv3raa8BmSLpcBDeTXCTcUIl20s77dGw/vT
 r5YNCMN7O4YPFKUoRK9604QTgw6qlYaRTQlJD09usprGqVylb6gQtfZZuZkYDMp8
 sEI77QHsePA=
 =HDxr
 -----END PGP SIGNATURE-----

Merge tag 'mem-2024-12-21' of https://github.com/davidhildenbrand/qemu into staging

Hi,

"Host Memory Backends" and "Memory devices" queue ("mem"):
- Fixup handling of virtio-mem unplug during system resets, as
  preparation for s390x support (especially kdump in the Linux guest)
- virtio-mem support for s390x

 # -----BEGIN PGP SIGNATURE-----
 #
 # iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmdnFD4RHGRhdmlkQHJl
 # ZGhhdC5jb20ACgkQTd4Q9wD/g1rWBBAAp7WkYaNAjRy1PgpjNZ3z1gUJc/vk+skJ
 # xVgGodA8txrJOFpNrbTyfhrdLs2TV4oWDvB/zrZRRtuxvur3O1EhFd9k6EqXuydr
 # 0FunvLvVJwRHfEZycjN4aacQMRH3CJw07OaTzexeSl5UR/6w5PRofwUK4HX7W/Ka
 # arqomGa3OJrs1+WgkV0Qcn4vh9HLRVv3iNC2Xo4W1wOCr1Du9zSPn9oC7zOQ0EO4
 # ZC//7QsdkNRjUX/yMXMkhlSXx3b/RmRg2DBrxo7BZXg27VwGu4uHxL4LRBZiB2A7
 # V9MqFOcVKzPMkXKTRjrgZ0vXQx1MPJ6WprEihMzMpYU6DrpA7KN/l8Ca8H24B2ln
 # h7+bmkDsHVVcWovE9ii/9cMRfws6uWXXg3KoA8RQ8IbX1tU02lblw2uHhXEzcoge
 # npqp/Z5LAiKVMetEnNnLH5thjut5PAEjuqD00cmZAMy4DNngLX2bGSdzMeVBkDMa
 # 78ehLGRplm3t7ibUfaZaMKe6UD9tFrcD6XKsvUTXXHNbYO8ynbx58WOxSZmY98zU
 # n3JNQRqtXYjBVlH3Dqm47vOTZHgOzFv3raa8BmSLpcBDeTXCTcUIl20s77dGw/vT
 # r5YNCMN7O4YPFKUoRK9604QTgw6qlYaRTQlJD09usprGqVylb6gQtfZZuZkYDMp8
 # sEI77QHsePA=
 # =HDxr
 # -----END PGP SIGNATURE-----
 # gpg: Signature made Sat 21 Dec 2024 14:17:18 EST
 # gpg:                using RSA key 1BD9CAAD735C4C3A460DFCCA4DDE10F700FF835A
 # gpg:                issuer "david@redhat.com"
 # gpg: Good signature from "David Hildenbrand <david@redhat.com>" [unknown]
 # gpg:                 aka "David Hildenbrand <davidhildenbrand@gmail.com>" [full]
 # gpg:                 aka "David Hildenbrand <hildenbr@in.tum.de>" [unknown]
 # gpg: WARNING: The key's User ID is not certified with a trusted signature!
 # gpg:          There is no indication that the signature belongs to the owner.
 # Primary key fingerprint: 1BD9 CAAD 735C 4C3A 460D  FCCA 4DDE 10F7 00FF 835A

* tag 'mem-2024-12-21' of https://github.com/davidhildenbrand/qemu:
  s390x: virtio-mem support
  s390x/virtio-ccw: add support for virtio based memory devices
  s390x: remember the maximum page size
  s390x/pv: prepare for memory devices
  s390x/s390-virtio-ccw: prepare for memory devices
  s390x/s390-skeys: prepare for memory devices
  s390x/s390-stattrib-kvm: prepare for memory devices and sparse memory layouts
  s390x/s390-hypercall: introduce DIAG500 STORAGE_LIMIT
  s390x: introduce s390_get_memory_limit()
  s390x/s390-virtio-ccw: move setting the maximum guest size from sclp to machine code
  s390x: rename s390-virtio-hcall* to s390-hypercall*
  s390x/s390-virtio-hcall: prepare for more diag500 hypercalls
  s390x/s390-virtio-hcall: remove hypercall registration mechanism
  s390x/s390-virtio-ccw: don't crash on weird RAM sizes
  virtio-mem: unplug memory only during system resets, not device resets

Conflicts:
- hw/s390x/s390-stattrib-kvm.c
  sysemu/ -> system/ header rename conflict.
- hw/s390x/virtio-ccw-mem.c
  Make Property array const and removed DEFINE_PROP_END_OF_LIST() to
  conform to the latest conventions.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-12-22 14:33:27 -05:00
David Hildenbrand
aa910c20ec s390x: virtio-mem support
Let's add our virtio-mem-ccw proxy device and wire it up. We should
be supporting everything (e.g., device unplug, "dynamic-memslots") that
we already support for the virtio-pci variant.

With a Linux guest that supports virtio-mem (and has automatic memory
onlining properly configured) the following example will work:

1. Start a VM with 4G initial memory and a virtio-mem device with a maximum
   capacity of 16GB:

   qemu/build/qemu-system-s390x \
    --enable-kvm \
    -m 4G,maxmem=20G \
    -nographic \
    -smp 8 \
    -hda Fedora-Server-KVM-40-1.14.s390x.qcow2 \
    -chardev socket,id=monitor,path=/var/tmp/monitor,server,nowait \
    -mon chardev=monitor,mode=readline \
    -object memory-backend-ram,id=mem0,size=16G,reserve=off \
    -device virtio-mem-ccw,id=vmem0,memdev=mem0,dynamic-memslots=on

2. Query the current size of virtio-mem device:

    (qemu) info memory-devices
    Memory device [virtio-mem]: "vmem0"
      memaddr: 0x100000000
      node: 0
      requested-size: 0
      size: 0
      max-size: 17179869184
      block-size: 1048576
      memdev: /objects/mem0

3. Request to grow it to 8GB (hotplug 8GB):

    (qemu) qom-set vmem0 requested-size 8G
    (qemu) info memory-devices
    Memory device [virtio-mem]: "vmem0"
      memaddr: 0x100000000
      node: 0
      requested-size: 8589934592
      size: 8589934592
      max-size: 17179869184
      block-size: 1048576
      memdev: /objects/mem0

4. Request to grow to 16GB (hotplug another 8GB):

    (qemu) qom-set vmem0 requested-size 16G
    (qemu) info memory-devices
    Memory device [virtio-mem]: "vmem0"
      memaddr: 0x100000000
      node: 0
      requested-size: 17179869184
      size: 17179869184
      max-size: 17179869184
      block-size: 1048576
      memdev: /objects/mem0

5. Try to hotunplug all memory again, shrinking to 0GB:

    (qemu) qom-set vmem0 requested-size 0G
    (qemu) info memory-devices
    Memory device [virtio-mem]: "vmem0"
      memaddr: 0x100000000
      node: 0
      requested-size: 0
      size: 0
      max-size: 17179869184
      block-size: 1048576
      memdev: /objects/mem0

6. If it worked, unplug the device

    (qemu) device_del vmem0
    (qemu) info memory-devices
    (qemu) object_del mem0

7. Hotplug a new device with a smaller capacity and directly size it to 1GB

    (qemu) object_add memory-backend-ram,id=mem0,size=8G,reserve=off
    (qemu) device_add virtio-mem-ccw,id=vmem0,memdev=mem0,\
                      dynamic-memslots=on,requested-size=1G
    (qemu) info memory-devices
    Memory device [virtio-mem]: "vmem0"
      memaddr: 0x100000000
      node: 0
      requested-size: 1073741824
      size: 1073741824
      max-size: 8589934592
      block-size: 1048576
      memdev: /objects/mem0

Trying to use a virtio-mem device backed by hugetlb into a !hugetlb VM
correctly results in the error:
   ... Memory device uses a bigger page size than initial memory

Note that the virtio-mem driver in Linux will supports 1 MiB (pageblock)
granularity.

Message-ID: <20241219144115.2820241-15-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2024-12-21 20:15:06 +01:00
Stefan Hajnoczi
65cb7129f4 Accel & Exec patch queue
- Ignore writes to CNTP_CTL_EL0 on HVF ARM (Alexander)
 - Add '-d invalid_mem' logging option (Zoltan)
 - Create QOM containers explicitly (Peter)
 - Rename sysemu/ -> system/ (Philippe)
 - Re-orderning of include/exec/ headers (Philippe)
   Move a lot of declarations from these legacy mixed bag headers:
     . "exec/cpu-all.h"
     . "exec/cpu-common.h"
     . "exec/cpu-defs.h"
     . "exec/exec-all.h"
     . "exec/translate-all"
   to these more specific ones:
     . "exec/page-protection.h"
     . "exec/translation-block.h"
     . "user/cpu_loop.h"
     . "user/guest-host.h"
     . "user/page-protection.h"
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmdlnyAACgkQ4+MsLN6t
 wN6mBw//QFWi7CrU+bb8KMM53kOU9C507tjn99LLGFb5or73/umDsw6eo/b8DHBt
 KIwGLgATel42oojKfNKavtAzLK5rOrywpboPDpa3SNeF1onW+99NGJ52LQUqIX6K
 A6bS0fPdGG9ZzEuPpbjDXlp++0yhDcdSgZsS42fEsT7Dyj5gzJYlqpqhiXGqpsn8
 4Y0UMxSL21K3HEexlzw2hsoOBFA3tUm2ujNDhNkt8QASr85yQVLCypABJnuoe///
 5Ojl5wTBeDwhANET0rhwHK8eIYaNboiM9fHopJYhvyw1bz6yAu9jQwzF/MrL3s/r
 xa4OBHBy5mq2hQV9Shcl3UfCQdk/vDaYaWpgzJGX8stgMGYfnfej1SIl8haJIfcl
 VMX8/jEFdYbjhO4AeGRYcBzWjEJymkDJZoiSWp2NuEDi6jqIW+7yW1q0Rnlg9lay
 ShAqLK5Pv4zUw3t0Jy3qv9KSW8sbs6PQxtzXjk8p97rTf76BJ2pF8sv1tVzmsidP
 9L92Hv5O34IqzBu2oATOUZYJk89YGmTIUSLkpT7asJZpBLwNM2qLp5jO00WVU0Sd
 +kAn324guYPkko/TVnjC/AY7CMu55EOtD9NU35k3mUAnxXT9oDUeL4NlYtfgrJx6
 x1Nzr2FkS68+wlPAFKNSSU5lTjsjNaFM0bIJ4LCNtenJVP+SnRo=
 =cjz8
 -----END PGP SIGNATURE-----

Merge tag 'exec-20241220' of https://github.com/philmd/qemu into staging

Accel & Exec patch queue

- Ignore writes to CNTP_CTL_EL0 on HVF ARM (Alexander)
- Add '-d invalid_mem' logging option (Zoltan)
- Create QOM containers explicitly (Peter)
- Rename sysemu/ -> system/ (Philippe)
- Re-orderning of include/exec/ headers (Philippe)
  Move a lot of declarations from these legacy mixed bag headers:
    . "exec/cpu-all.h"
    . "exec/cpu-common.h"
    . "exec/cpu-defs.h"
    . "exec/exec-all.h"
    . "exec/translate-all"
  to these more specific ones:
    . "exec/page-protection.h"
    . "exec/translation-block.h"
    . "user/cpu_loop.h"
    . "user/guest-host.h"
    . "user/page-protection.h"

 # -----BEGIN PGP SIGNATURE-----
 #
 # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmdlnyAACgkQ4+MsLN6t
 # wN6mBw//QFWi7CrU+bb8KMM53kOU9C507tjn99LLGFb5or73/umDsw6eo/b8DHBt
 # KIwGLgATel42oojKfNKavtAzLK5rOrywpboPDpa3SNeF1onW+99NGJ52LQUqIX6K
 # A6bS0fPdGG9ZzEuPpbjDXlp++0yhDcdSgZsS42fEsT7Dyj5gzJYlqpqhiXGqpsn8
 # 4Y0UMxSL21K3HEexlzw2hsoOBFA3tUm2ujNDhNkt8QASr85yQVLCypABJnuoe///
 # 5Ojl5wTBeDwhANET0rhwHK8eIYaNboiM9fHopJYhvyw1bz6yAu9jQwzF/MrL3s/r
 # xa4OBHBy5mq2hQV9Shcl3UfCQdk/vDaYaWpgzJGX8stgMGYfnfej1SIl8haJIfcl
 # VMX8/jEFdYbjhO4AeGRYcBzWjEJymkDJZoiSWp2NuEDi6jqIW+7yW1q0Rnlg9lay
 # ShAqLK5Pv4zUw3t0Jy3qv9KSW8sbs6PQxtzXjk8p97rTf76BJ2pF8sv1tVzmsidP
 # 9L92Hv5O34IqzBu2oATOUZYJk89YGmTIUSLkpT7asJZpBLwNM2qLp5jO00WVU0Sd
 # +kAn324guYPkko/TVnjC/AY7CMu55EOtD9NU35k3mUAnxXT9oDUeL4NlYtfgrJx6
 # x1Nzr2FkS68+wlPAFKNSSU5lTjsjNaFM0bIJ4LCNtenJVP+SnRo=
 # =cjz8
 # -----END PGP SIGNATURE-----
 # gpg: Signature made Fri 20 Dec 2024 11:45:20 EST
 # gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
 # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [unknown]
 # gpg: WARNING: This key is not certified with a trusted signature!
 # gpg:          There is no indication that the signature belongs to the owner.
 # Primary key fingerprint: FAAB E75E 1291 7221 DCFD  6BB2 E3E3 2C2C DEAD C0DE

* tag 'exec-20241220' of https://github.com/philmd/qemu: (59 commits)
  util/qemu-timer: fix indentation
  meson: Do not define CONFIG_DEVICES on user emulation
  system/accel-ops: Remove unnecessary 'exec/cpu-common.h' header
  system/numa: Remove unnecessary 'exec/cpu-common.h' header
  hw/xen: Remove unnecessary 'exec/cpu-common.h' header
  target/mips: Drop left-over comment about Jazz machine
  target/mips: Remove tswap() calls in semihosting uhi_fstat_cb()
  target/xtensa: Remove tswap() calls in semihosting simcall() helper
  accel/tcg: Un-inline translator_is_same_page()
  accel/tcg: Include missing 'exec/translation-block.h' header
  accel/tcg: Move tcg_cflags_has/set() to 'exec/translation-block.h'
  accel/tcg: Restrict curr_cflags() declaration to 'internal-common.h'
  qemu/coroutine: Include missing 'qemu/atomic.h' header
  exec/translation-block: Include missing 'qemu/atomic.h' header
  accel/tcg: Declare cpu_loop_exit_requested() in 'exec/cpu-common.h'
  exec/cpu-all: Include 'cpu.h' earlier so MMU_USER_IDX is always defined
  target/sparc: Move sparc_restore_state_to_opc() to cpu.c
  target/sparc: Uninline cpu_get_tb_cpu_state()
  target/loongarch: Declare loongarch_cpu_dump_state() locally
  user: Move various declarations out of 'exec/exec-all.h'
  ...

Conflicts:
	hw/char/riscv_htif.c
	hw/intc/riscv_aplic.c
	target/s390x/cpu.c

	Apply sysemu header path changes to not in the pull request.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-12-21 11:07:00 -05:00
David Hildenbrand
713484d038 virtio-mem: unplug memory only during system resets, not device resets
We recently converted from the LegacyReset to the new reset framework
in commit c009a311e9 ("virtio-mem: Use new Resettable framework instead
of LegacyReset") to be able to use the ResetType to filter out wakeup
resets.

However, this change had an undesired implications: as we override the
Resettable interface methods in VirtIOMEMClass, the reset handler will
not only get called during system resets (i.e., qemu_devices_reset())
but also during any direct or indirect device rests (e.g.,
device_cold_reset()).

Further, we might now receive two reset callbacks during
qemu_devices_reset(), first when reset by a parent and later when reset
directly.

The memory state of virtio-mem devices is rather special: it's supposed to
be persistent/unchanged during most resets (similar to resetting a hard
disk will not destroy the data), unless actually cold-resetting the whole
system (different to a hard disk where a reboot will not destroy the data):
ripping out system RAM is something guest OSes don't particularly enjoy,
but we want to detect when rebooting to an OS that does not support
virtio-mem and wouldn't be able to detect+use the memory -- and we want
to force-defragment hotplugged memory to also shrink the usable device
memory region. So we rally want to catch system resets to do that.

On supported targets (e.g., x86), getting a cold reset on the
device/parent triggers is not that easy (but looks like PCI code
might trigger it), so this implication went unnoticed.

However, with upcoming s390x support it is problematic: during
kdump, s390x triggers a subsystem reset, ending up in
s390_machine_reset() and calling only subsystem_reset() instead of
qemu_devices_reset() -- because it's not a full system reset.

In subsystem_reset(), s390x performs a device_cold_reset() of any
TYPE_VIRTUAL_CSS_BRIDGE device, which ends up resetting all children,
including the virtio-mem device. Consequently, we wrongly detect a system
reset and unplug all device memory, resulting in hotplugged memory not
getting included in the crash dump -- undesired.

We really must not mess with hotplugged memory state during simple
device resets. To fix, create+register a new reset object that will only
get triggered during qemu_devices_reset() calls, but not during any other
resets as it is logically not the child of any other object.

Message-ID: <20241025104103.342188-1-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Juraj Marcin <jmarcin@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
2024-12-21 15:59:59 +01:00
Philippe Mathieu-Daudé
32cad1ffb8 include: Rename sysemu/ -> system/
Headers in include/sysemu/ are not only related to system
*emulation*, they are also used by virtualization. Rename
as system/ which is clearer.

Files renamed manually then mechanical change using sed tool.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20241203172445.28576-1-philmd@linaro.org>
2024-12-20 17:44:56 +01:00
Richard Henderson
5fcabe628b include/hw/qdev-properties: Remove DEFINE_PROP_END_OF_LIST
Now that all of the Property arrays are counted, we can remove
the terminator object from each array.  Update the assertions
in device_class_set_props to match.

With struct Property being 88 bytes, this was a rather large
form of terminator.  Saves 30k from qemu-system-aarch64.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://lore.kernel.org/r/20241218134251.4724-21-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-19 19:36:37 +01:00
Richard Henderson
1577a9180f hw/virtio: Constify all Property
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-12-15 12:56:40 -06:00
Peter Xu
34a8892dec migration: Drop migration_is_idle()
Now with the current migration_is_running(), it will report exactly the
opposite of what will be reported by migration_is_idle().

Drop migration_is_idle(), instead use "!migration_is_running()" which
should be identical on functionality.

In reality, most of the idle check is inverted, so it's even easier to
write with "migrate_is_running()" check.

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241024213056.1395400-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31 15:48:18 -04:00
Juraj Marcin
1f5f49056d virtio-mem: Add support for suspend+wake-up with plugged memory
Before, the virtio-mem device would unplug all the memory with any reset
of the device, including during the wake-up of the guest from a
suspended state. Due to this, the virtio-mem driver in the Linux kernel
disallowed suspend-to-ram requests in the guest when the
VIRTIO_MEM_F_PERSISTENT_SUSPEND feature is not exposed by QEMU.

This patch adds the code to skip the reset on wake-up and exposes
theVIRTIO_MEM_F_PERSISTENT_SUSPEND feature to the guest kernel driver
when suspending is possible in QEMU (currently only x86).

Message-ID: <20240904103722.946194-5-jmarcin@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2024-09-24 11:33:35 +02:00
Juraj Marcin
c009a311e9 virtio-mem: Use new Resettable framework instead of LegacyReset
LegacyReset does not pass ResetType to the reset callback method, which
the new Resettable framework uses. Due to this, virtio-mem cannot use
the new RESET_TYPE_WAKEUP to skip the reset during wake-up from a
suspended state.

This patch adds overrides Resettable interface methods in VirtIOMEMClass
to use the new Resettable framework and replaces
qemu_[un]register_reset() calls with qemu_[un]register_resettable().

Message-ID: <20240904103722.946194-4-jmarcin@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2024-09-24 11:33:35 +02:00
David Hildenbrand
95b717a815 virtio-mem: don't warn about THP sizes on a kernel without THP support
If the config directory in sysfs does not exist at all, we are dealing
with a system that does not support THPs. Simply use 1 MiB block size
then, instead of warning "Could not detect THP size, falling back to
..." and falling back to the default THP size.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20240910163433.2100295-1-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-09-11 09:46:14 -04:00
David Hildenbrand
4d13ae45ff virtio-mem: improve error message when unplug of device fails due to plugged memory
The error message is actually expressive, considering QEMU only. But
when called from Libvirt, talking about "size" can be confusing, because
in Libvirt "size" translates to the memory backend size in QEMU (maximum
size) and "current" translates to the QEMU "size" property.

Let's simply avoid talking about the "size" property and spell out that
some device memory is still plugged.

Message-ID: <20240416141426.588544-1-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Cc: Liang Cong <lcong@redhat.com>
Cc: Mario Casquero <mcasquer@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2024-07-10 18:06:24 +02:00
Mark Kanda
04accf43df oslib-posix: initialize backend memory objects in parallel
QEMU initializes preallocated backend memory as the objects are parsed from
the command line. This is not optimal in some cases (e.g. memory spanning
multiple NUMA nodes) because the memory objects are initialized in series.

Allow the initialization to occur in parallel (asynchronously). In order to
ensure optimal thread placement, asynchronous initialization requires prealloc
context threads to be in use.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Message-ID: <20240131165327.3154970-2-mark.kanda@oracle.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2024-02-06 08:15:22 +01:00
Philippe Mathieu-Daudé
9c878ad6fb misc: Simplify qemu_prealloc_mem() calls
Since qemu_prealloc_mem() returns whether or not an error
occured, we don't need to check the @errp pointer. Remove
local_err uses when we can return directly.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-20-philmd@linaro.org>
2024-01-05 16:20:15 +01:00
Richard Henderson
ca02a17054 hw/virtio: Constify VMState
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231221031652.119827-61-richard.henderson@linaro.org>
2023-12-30 07:38:06 +11:00
David Hildenbrand
364eff6885 virtio-mem: fix division by zero in virtio_mem_activate_memslots_to_plug()
When running with "dynamic-memslots=off", we enter
virtio_mem_activate_memslots_to_plug() to return immediately again
because "vmem->dynamic_memslots == false". However, the compiler might
not optimize out calculating start_idx+end_idx, where we divide by
vmem->memslot_size. In such a configuration, the memslot size is 0 and
we'll get a division by zero:

    (qemu) qom-set vmem0 requested-size 3G
    (qemu) q35.sh: line 38: 622940 Floating point exception(core dumped)

The same is true for virtio_mem_deactivate_unplugged_memslots(), however
we never really reach that code without a prior
virtio_mem_activate_memslots_to_plug() call.

Let's fix it by simply calling these functions only with
"dynamic-memslots=on".

This was found when using a debug build of QEMU.

Message-ID: <20231023111341.219317-1-david@redhat.com>
Reprted-by: Mario Casquero <mcasquer@redhat.com>
Fixes: 177f9b1ee4 ("virtio-mem: Expose device memory dynamically via multiple memslots if enabled")
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-11-13 09:35:44 +01:00
Juan Quintela
99b16e8ee4 migration: Use vmstate_register_any()
This are the easiest cases, where we were already using
VMSTATE_INSTANCE_ID_ANY.

Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-3-quintela@redhat.com>
2023-11-01 16:13:58 +01:00
David Hildenbrand
ee6398d862 virtio-mem: Mark memslot alias memory regions unmergeable
Let's mark the memslot alias memory regions as unmergable, such that
flatview and vhost won't merge adjacent memory region aliases and we can
atomically map/unmap individual aliases without affecting adjacent
alias memory regions.

This handles vhost and vfio in multiple-memslot mode correctly (which do
not support atomic memslot updates) and avoids the temporary removal of
large memslots, which can be an expensive operation. For example, vfio
might have to unpin + repin a lot of memory, which is undesired.

Message-ID: <20230926185738.277351-19-david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-10-12 14:15:22 +02:00
David Hildenbrand
177f9b1ee4 virtio-mem: Expose device memory dynamically via multiple memslots if enabled
Having large virtio-mem devices that only expose little memory to a VM
is currently a problem: we map the whole sparse memory region into the
guest using a single memslot, resulting in one gigantic memslot in KVM.
KVM allocates metadata for the whole memslot, which can result in quite
some memory waste.

Assuming we have a 1 TiB virtio-mem device and only expose little (e.g.,
1 GiB) memory, we would create a single 1 TiB memslot and KVM has to
allocate metadata for that 1 TiB memslot: on x86, this implies allocating
a significant amount of memory for metadata:

(1) RMAP: 8 bytes per 4 KiB, 8 bytes per 2 MiB, 8 bytes per 1 GiB
    -> For 1 TiB: 2147483648 + 4194304 + 8192 = ~ 2 GiB (0.2 %)

    With the TDP MMU (cat /sys/module/kvm/parameters/tdp_mmu) this gets
    allocated lazily when required for nested VMs
(2) gfn_track: 2 bytes per 4 KiB
    -> For 1 TiB: 536870912 = ~512 MiB (0.05 %)
(3) lpage_info: 4 bytes per 2 MiB, 4 bytes per 1 GiB
    -> For 1 TiB: 2097152 + 4096 = ~2 MiB (0.0002 %)
(4) 2x dirty bitmaps for tracking: 2x 1 bit per 4 KiB page
    -> For 1 TiB: 536870912 = 64 MiB (0.006 %)

So we primarily care about (1) and (2). The bad thing is, that the
memory consumption *doubles* once SMM is enabled, because we create the
memslot once for !SMM and once for SMM.

Having a 1 TiB memslot without the TDP MMU consumes around:
* With SMM: 5 GiB
* Without SMM: 2.5 GiB
Having a 1 TiB memslot with the TDP MMU consumes around:
* With SMM: 1 GiB
* Without SMM: 512 MiB

... and that's really something we want to optimize, to be able to just
start a VM with small boot memory (e.g., 4 GiB) and a virtio-mem device
that can grow very large (e.g., 1 TiB).

Consequently, using multiple memslots and only mapping the memslots we
really need can significantly reduce memory waste and speed up
memslot-related operations. Let's expose the sparse RAM memory region using
multiple memslots, mapping only the memslots we currently need into our
device memory region container.

The feature can be enabled using "dynamic-memslots=on" and requires
"unplugged-inaccessible=on", which is nowadays the default.

Once enabled, we'll auto-detect the number of memslots to use based on the
memslot limit provided by the core. We'll use at most 1 memslot per
gigabyte. Note that our global limit of memslots accross all memory devices
is currently set to 256: even with multiple large virtio-mem devices,
we'd still have a sane limit on the number of memslots used.

The default is to not dynamically map memslot for now
("dynamic-memslots=off"). The optimization must be enabled manually,
because some vhost setups (e.g., hotplug of vhost-user devices) might be
problematic until we support more memslots especially in vhost-user backends.

Note that "dynamic-memslots=on" is just a hint that multiple memslots
*may* be used for internal optimizations, not that multiple memslots
*must* be used. The actual number of memslots that are used is an
internal detail: for example, once memslot metadata is no longer an
issue, we could simply stop optimizing for that. Migration source and
destination can differ on the setting of "dynamic-memslots".

Message-ID: <20230926185738.277351-17-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-10-12 14:15:22 +02:00
David Hildenbrand
884a0c20e6 virtio-mem: Update state to match bitmap as soon as it's been migrated
It's cleaner and future-proof to just have other state that depends on the
bitmap state to be updated as soon as possible when restoring the bitmap.

So factor out informing RamDiscardListener into a functon and call it in
case of early migration right after we restored the bitmap.

Message-ID: <20230926185738.277351-16-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-10-12 14:15:22 +02:00
David Hildenbrand
a45171dba7 virtio-mem: Pass non-const VirtIOMEM via virtio_mem_range_cb
Let's prepare for a user that has to modify the VirtIOMEM device state.

Message-ID: <20230926185738.277351-15-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-10-12 14:15:22 +02:00
Michael Tokarev
9b4b4e510b hw/other: spelling fixes
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2023-09-21 11:31:16 +03:00
David Hildenbrand
92a8ee1b59 virtio-mem: Prepare for device unplug support
In many cases, blindly unplugging a virtio-mem device is problematic. We
can only safely remove a device once:
* The guest is not expecting to be able to read unplugged memory
  (unplugged-inaccessible == on)
* The virtio-mem device does not have memory plugged (size == 0)
* The virtio-mem device does not have outstanding requests to the VM to
  plug memory (requested-size == 0)

So let's add a callback to the virtio-mem device class to check for that.
We'll wire-up virtio-mem-pci next.

Message-ID: <20230711153445.514112-7-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:27:31 +02:00
David Hildenbrand
b01fd4b67a virtio-mem: Support "x-ignore-shared" migration
To achieve desired "x-ignore-shared" functionality, we should not
discard all RAM when realizing the device and not mess with
preallocation/postcopy when loading device state. In essence, we should
not touch RAM content.

As "x-ignore-shared" gets set after realizing the device, we cannot
rely on that. Let's simply skip discarding of RAM on incoming migration.
Note that virtio_mem_post_load() will call
virtio_mem_restore_unplugged() -- unless "x-ignore-shared" is set. So
once migration finished we'll have a consistent state.

The initial system reset will also not discard any RAM, because
virtio_mem_unplug_all() will not call virtio_mem_unplug_all() when no
memory is plugged (which is the case before loading the device state).

Note that something like VM templating -- see commit b17fbbe55c
("migration: allow private destination ram with x-ignore-shared") -- is
currently incompatible with virtio-mem and ram_block_discard_range() will
warn in case a private file mapping is supplied by virtio-mem.

For VM templating with virtio-mem, it makes more sense to either
(a) Create the template without the virtio-mem device and hotplug a
    virtio-mem device to the new VM instances using proper own memory
    backend.
(b) Use a virtio-mem device that doesn't provide any memory in the
    template (requested-size=0) and use private anonymous memory.

Message-ID: <20230706075612.67404-5-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:25:37 +02:00
David Hildenbrand
836f657b6a virtio-mem: Skip most of virtio_mem_unplug_all() without plugged memory
Already when starting QEMU we perform one system reset that ends up
triggering virtio_mem_unplug_all() with no actual memory plugged yet.
That, in turn will trigger ram_block_discard_range() and perform some
other actions that are not required in that case.

Let's optimize virtio_mem_unplug_all() for the case that no memory is
plugged. This will be beneficial for x-ignore-shared support as well.

Message-ID: <20230706075612.67404-3-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:25:37 +02:00
David Hildenbrand
25c893037b virtio-mem: Simplify bitmap handling and virtio_mem_set_block_state()
Let's separate plug and unplug handling to prepare for future changes
and make the code a bit easier to read -- working on block states
(plugged/unplugged) instead of on a bitmap.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230523183036.517957-1-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-23 02:54:44 -04:00
Philippe Mathieu-Daudé
4ee4667ded hw/virtio: Remove unnecessary 'virtio-access.h' header
None of these files use the VirtIO Load/Store API declared
by "hw/virtio/virtio-access.h". This header probably crept
in via copy/pasting, remove it.

Note, "virtio-access.h" is target-specific, so any file
including it also become tainted as target-specific.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230524093744.88442-10-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2023-06-23 02:54:44 -04:00
Philippe Mathieu-Daudé
21e6435066 hw/virtio/virtio-mem: Use qemu_ram_get_fd() helper
Avoid accessing RAMBlock internals, use the provided
qemu_ram_get_fd() getter to get the file descriptor.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230524093744.88442-7-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2023-06-23 02:54:44 -04:00
David Hildenbrand
d5cef02574 virtio-mem: Default to "unplugged-inaccessible=on" with 8.1 on x86-64
Allowing guests to read unplugged memory simplified the bring-up of
virtio-mem in Linux guests -- which was limited to x86-64 only. On arm64
(which was added later), we never had legacy guests and don't even allow
to configure it, essentially always having "unplugged-inaccessible=on".

At this point, all guests we care about
should be supporting VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, so let's
change the default for the 8.1 machine.

This change implies that also memory that supports the shared zeropage
(private anonymous memory) will now require
VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE in the driver in order to be usable by
the guest -- as default, one can still manually set the
unplugged-inaccessible property.

Disallowing the guest to read unplugged memory will be important for
some future features, such as memslot optimizations or protection of
unplugged memory, whereby we'll actually no longer allow the guest to
even read from unplugged memory.

At some point, we might want to deprecate and remove that property.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230503182352.792458-1-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-05-19 10:30:46 -04:00
David Hildenbrand
d71920d425 virtio-mem: Proper support for preallocation with migration
Ordinary memory preallocation runs when QEMU starts up and creates the
memory backends, before processing the incoming migration stream. With
virtio-mem, we don't know which memory blocks to preallocate before
migration started. Now that we migrate the virtio-mem bitmap early, before
migrating any RAM content, we can safely preallocate memory for all plugged
memory blocks before migrating any RAM content.

This is especially relevant for the following cases:

(1) User errors

With hugetlb/files, if we don't have sufficient backend memory available on
the migration destination, we'll crash QEMU (SIGBUS) during RAM migration
when running out of backend memory. Preallocating memory before actual
RAM migration allows for failing gracefully and informing the user about
the setup problem.

(2) Excluded memory ranges during migration

For example, virtio-balloon free page hinting will exclude some pages
from getting migrated. In that case, we won't crash during RAM
migration, but later, when running the VM on the destination, which is
bad.

To fix this for new QEMU machines that migrate the bitmap early,
preallocate the memory early, before any RAM migration. Warn with old
QEMU machines.

Getting postcopy right is a bit tricky, but we essentially now implement
the same (problematic) preallocation logic as ordinary preallocation:
preallocate memory early and discard it again before precopy starts. During
ordinary preallocation, discarding of RAM happens when postcopy is advised.
As the state (bitmap) is loaded after postcopy was advised but before
postcopy starts listening, we have to discard memory we preallocated
immediately again ourselves.

Note that nothing (not even hugetlb reservations) guarantees for postcopy
that backend memory (especially, hugetlb pages) are still free after they
were freed ones while discarding RAM. Still, allocating that memory at
least once helps catching some basic setup problems.

Before this change, trying to restore a VM when insufficient hugetlb
pages are around results in the process crashing to to a "Bus error"
(SIGBUS). With this change, QEMU fails gracefully:

  qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address
  qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-mem-device-early'
  qemu-system-x86_64: load of migration failed: Cannot allocate memory

And we can even introspect the early migration data, including the
bitmap:
  $ ./scripts/analyze-migration.py -f STATEFILE
  {
  "ram (2)": {
      "section sizes": {
          "0000:00:03.0/mem0": "0x0000000780000000",
          "0000:00:04.0/mem1": "0x0000000780000000",
          "pc.ram": "0x0000000100000000",
          "/rom@etc/acpi/tables": "0x0000000000020000",
          "pc.bios": "0x0000000000040000",
          "0000:00:02.0/e1000.rom": "0x0000000000040000",
          "pc.rom": "0x0000000000020000",
          "/rom@etc/table-loader": "0x0000000000001000",
          "/rom@etc/acpi/rsdp": "0x0000000000001000"
      }
  },
  "0000:00:03.0/virtio-mem-device-early (51)": {
      "tmp": "00 00 00 01 40 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
      "size": "0x0000000040000000",
      "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...]
  },
  "0000:00:04.0/virtio-mem-device-early (53)": {
      "tmp": "00 00 00 08 c0 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
      "size": "0x00000001fa400000",
      "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...]
  },
  [...]

Reported-by: Jing Qi <jinqi@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06 19:22:56 +01:00
David Hildenbrand
3b95a71b22 virtio-mem: Migrate immutable properties early
The bitmap and the size are immutable while migration is active: see
virtio_mem_is_busy(). We can migrate this information early, before
migrating any actual RAM content. Further, all information we need for
sanity checks is immutable as well.

Having this information in place early will, for example, allow for
properly preallocating memory before touching these memory locations
during RAM migration: this way, we can make sure that all memory was
actually preallocated and that any user errors (e.g., insufficient
hugetlb pages) can be handled gracefully.

In contrast, usable_region_size and requested_size can theoretically
still be modified on the source while the VM is running. Keep migrating
these properties the usual, late, way.

Use a new device property to keep behavior of compat machines
unmodified.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06 19:22:56 +01:00
David Hildenbrand
ce1761f0f9 virtio-mem: Fail if a memory backend with "prealloc=on" is specified
"prealloc=on" for the memory backend does not work as expected, as
virtio-mem will simply discard all preallocated memory immediately again.
In the best case, it's an expensive NOP. In the worst case, it's an
unexpected allocation error.

Instead, "prealloc=on" should be specified for the virtio-mem device only,
such that virtio-mem will try preallocating memory before plugging
memory dynamically to the guest. Fail if such a memory backend is
provided.

Tested-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06 19:22:56 +01:00
Philippe Mathieu-Daudé
82ba778e13 virtio-mem: Fix typo in function name
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221228130956.80515-1-philmd@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
2022-12-28 14:59:55 +01:00
Chenyi Qiang
29f1b328e3 virtio-mem: Fix the iterator variable in a vmem->rdl_list loop
It should be the variable rdl2 to revert the already-notified listeners.

Fixes: 2044969f0b ("virtio-mem: Implement RamDiscardManager interface")
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20221228090312.17276-1-chenyi.qiang@intel.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
2022-12-28 14:59:55 +01:00
Chenyi Qiang
b11cf32e07 virtio-mem: Fix the bitmap index of the section offset
vmem->bitmap indexes the memory region of the virtio-mem backend at a
granularity of block_size. To calculate the index of target section offset,
the block_size should be divided instead of the bitmap_size.

Fixes: 2044969f0b ("virtio-mem: Implement RamDiscardManager interface")
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20221216062231.11181-1-chenyi.qiang@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: David Hildenbrand <david@redhat.com>
2022-12-28 14:59:47 +01:00
Markus Armbruster
d1c81c3496 qapi: Use returned bool to check for failure (again)
Commit 012d4c96e2 changed the visitor functions taking Error ** to
return bool instead of void, and the commits following it used the new
return value to simplify error checking.  Since then a few more uses
in need of the same treatment crept in.  Do that.  All pretty
mechanical except for

* balloon_stats_get_all()

  This is basically the same transformation commit 012d4c96e2 applied
  to the virtual walk example in include/qapi/visitor.h.

* set_max_queue_size()

  Additionally replace "goto end of function" by return.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221121085054.683122-10-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2022-12-14 16:19:35 +01:00
David Hildenbrand
e04a34e55c util: Make qemu_prealloc_mem() optionally consume a ThreadContext
... and implement it under POSIX. When a ThreadContext is provided,
create new threads via the context such that these new threads obtain a
properly configured CPU affinity.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-6-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27 11:00:56 +02:00
David Hildenbrand
6556aadc18 util: Cleanup and rename os_mem_prealloc()
Let's
* give the function a "qemu_*" style name
* make sure the parameters in the implementation match the prototype
* rename smp_cpus to max_threads, which makes the semantics of that
  parameter clearer

... and add a function documentation.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-2-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27 11:00:28 +02:00
Jonah Palmer
3857cd5c7f virtio: drop name parameter for virtio_init()
This patch drops the name parameter for the virtio_init function.

The pair between the numeric device ID and the string device ID
(name) of a virtio device already exists, but not in a way that
lets us map between them.

This patch lets us do this and removes the need for the name
parameter in the virtio_init function.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <1648819405-25696-2-git-send-email-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-16 04:38:40 -04:00
Marc-André Lureau
0f9668e0c1 Remove qemu-common.h include from most units
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 14:31:55 +02:00
Marc-André Lureau
8e3b0cbb72 Replace qemu_real_host_page variables with inlined functions
Replace the global variables with inlined helper functions. getpagesize() is very
likely annotated with a "const" function attribute (at least with glibc), and thus
optimization should apply even better.

This avoids the need for a constructor initialization too.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 10:50:38 +02:00
Gavin Shan
b1b87327a9 hw/arm/virt: Support for virtio-mem-pci
This supports virtio-mem-pci device on "virt" platform, by simply
following the implementation on x86.

   * This implements the hotplug handlers to support virtio-mem-pci
     device hot-add, while the hot-remove isn't supported as we have
     on x86.

   * The block size is 512MB on ARM64 instead of 128MB on x86.

   * It has been passing the tests with various combinations like 64KB
     and 4KB page sizes on host and guest, different memory device
     backends like normal, transparent huge page and HugeTLB, plus
     migration.

Co-developed-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-id: 20220111063329.74447-3-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-01-20 11:47:52 +00:00
Gavin Shan
1263615efe virtio-mem: Correct default THP size for ARM64
The default block size is same as to the THP size, which is either
retrieved from "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
or hardcoded to 2MB. There are flaws in both mechanisms and this
intends to fix them up.

  * When "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" is
    used to getting the THP size, 32MB and 512MB are valid values
    when we have 16KB and 64KB page size on ARM64.

  * When the hardcoded THP size is used, 2MB, 32MB and 512MB are
    valid values when we have 4KB, 16KB and 64KB page sizes on
    ARM64.

Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-id: 20220111063329.74447-2-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-01-20 11:47:52 +00:00
David Hildenbrand
60f1f77cab virtio-mem: Set "unplugged-inaccessible=auto" for the 7.0 machine on x86
Set the new default to "auto", keeping it set to "off" for compat
machines. This property is only available for x86 targets.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211217134039.29670-4-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07 19:30:13 -05:00
David Hildenbrand
23ad8dec8d virtio-mem: Support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE
With VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we signal the VM that reading
unplugged memory is not supported. We have to fail feature negotiation
in case the guest does not support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE.

First, VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE is required to properly handle
memory backends (or architectures) without support for the shared zeropage
in the hypervisor cleanly. Without the shared zeropage, even reading an
unpopulated virtual memory location can populate real memory and
consequently consume memory in the hypervisor. We have a guaranteed shared
zeropage only on MAP_PRIVATE anonymous memory.

Second, we want VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE to be the default
long-term as even populating the shared zeropage can be problematic: for
example, without THP support (possible) or without support for the shared
huge zeropage with THP (unlikely), the PTE page tables to hold the shared
zeropage entries can consume quite some memory that cannot be reclaimed
easily.

Third, there are other optimizations+features (e.g., protection of
unplugged memory, reducing the total memory slot size and bitmap sizes)
that will require VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE.

We really only support x86 targets with virtio-mem for now (and
Linux similarly only support x86), but that might change soon, so prepare
for different targets already.

Add a new "unplugged-inaccessible" tristate property for x86 targets:
- "off" will keep VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE unset and legacy
  guests working.
- "on" will set VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE and stop legacy guests
  from using the device.
- "auto" selects the default based on support for the shared zeropage.

Warn in case the property is set to "off" and we don't have support for the
shared zeropage.

For existing compat machines, the property will default to "off", to
not change the behavior but eventually warn about a problematic setup.
Short-term, we'll set the property default to "auto" for new QEMU machines.
Mid-term, we'll set the property default to "on" for new QEMU machines.
Long-term, we'll deprecate the parameter and disallow legacy
guests completely.

The property has to match on the migration source and destination. "auto"
will result in the same VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE setting as long
as the qemu command line (esp. memdev) match -- so "auto" is good enough
for migration purposes and the parameter doesn't have to be migrated
explicitly.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211217134039.29670-3-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07 19:30:13 -05:00