mirror of
https://github.com/Motorhead1991/qemu.git
synced 2025-12-25 17:08:36 -07:00
Official QEMU mirror. Please see https://www.qemu.org/contribute/ for how to submit changes to QEMU. Pull Requests are ignored. Please only use release tarballs from the QEMU website.
http://www.qemu.org
Under certain circumstances normal xen-mapcache functioning may be broken
by guest's actions. This may lead to either QEMU performing exit() due to
a caught bad pointer (and with QEMU process gone the guest domain simply
appears hung afterwards) or actual use of the incorrect pointer inside
QEMU address space -- a write to unmapped memory is possible. The bug is
hard to reproduce on a i440 machine as multiple DMA sources are required
(though it's possible in theory, using multiple emulated devices), but can
be reproduced somewhat easily on a Q35 machine using an emulated AHCI
controller -- each NCQ queue command slot may be used as an independent
DMA source ex. using READ FPDMA QUEUED command, so a single storage
device on the AHCI controller port will be enough to produce multiple DMAs
(up to 32). The detailed description of the issue follows.
Xen-mapcache provides an ability to map parts of a guest memory into
QEMU's own address space to work with.
There are two types of cache lookups:
- translating a guest physical address into a pointer in QEMU's address
space, mapping a part of guest domain memory if necessary (while trying
to reduce a number of such (re)mappings to a minimum)
- translating a QEMU's pointer back to its physical address in guest RAM
These lookups are managed via two linked-lists of structures.
MapCacheEntry is used for forward cache lookups, while MapCacheRev -- for
reverse lookups.
Every guest physical address is broken down into 2 parts:
address_index = phys_addr >> MCACHE_BUCKET_SHIFT;
address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1);
MCACHE_BUCKET_SHIFT depends on a system (32/64) and is equal to 20 for
a 64-bit system (which assumed for the further description). Basically,
this means that we deal with 1 MB chunks and offsets within those 1 MB
chunks. All mappings are created with 1MB-granularity, i.e. 1MB/2MB/3MB
etc. Most DMA transfers typically are less than 1MB, however, if the
transfer crosses any 1MB border(s) - than a nearest larger mapping size
will be used, so ex. a 512-byte DMA transfer with the start address
700FFF80h will actually require a 2MB range.
Current implementation assumes that MapCacheEntries are unique for a given
address_index and size pair and that a single MapCacheEntry may be reused
by multiple requests -- in this case the 'lock' field will be larger than
1. On other hand, each requested guest physical address (with 'lock' flag)
is described by each own MapCacheRev. So there may be multiple MapCacheRev
entries corresponding to a single MapCacheEntry. The xen-mapcache code
uses MapCacheRev entries to retrieve the address_index & size pair which
in turn used to find a related MapCacheEntry. The 'lock' field within
a MapCacheEntry structure is actually a reference counter which shows
a number of corresponding MapCacheRev entries.
The bug lies in ability for the guest to indirectly manipulate with the
xen-mapcache MapCacheEntries list via a special sequence of DMA
operations, typically for storage devices. In order to trigger the bug,
guest needs to issue DMA operations in specific order and timing.
Although xen-mapcache is protected by the mutex lock -- this doesn't help
in this case, as the bug is not due to a race condition.
Suppose we have 3 DMA transfers, namely A, B and C, where
- transfer A crosses 1MB border and thus uses a 2MB mapping
- transfers B and C are normal transfers within 1MB range
- and all 3 transfers belong to the same address_index
In this case, if all these transfers are to be executed one-by-one
(without overlaps), no special treatment necessary -- each transfer's
mapping lock will be set and then cleared on unmap before starting
the next transfer.
The situation changes when DMA transfers overlap in time, ex. like this:
|===== transfer A (2MB) =====|
|===== transfer B (1MB) =====|
|===== transfer C (1MB) =====|
time --->
In this situation the following sequence of actions happens:
1. transfer A creates a mapping to 2MB area (lock=1)
2. transfer B (1MB) tries to find available mapping but cannot find one
because transfer A is still in progress, and it has 2MB size + non-zero
lock. So transfer B creates another mapping -- same address_index,
but 1MB size.
3. transfer A completes, making 1st mapping entry available by setting its
lock to 0
4. transfer C starts and tries to find available mapping entry and sees
that 1st entry has lock=0, so it uses this entry but remaps the mapping
to a 1MB size
5. transfer B completes and by this time
- there are two locked entries in the MapCacheEntry list with the SAME
values for both address_index and size
- the entry for transfer B actually resides farther in list while
transfer C's entry is first
6. xen_ram_addr_from_mapcache() for transfer B gets correct address_index
and size pair from corresponding MapCacheRev entry, but then it starts
looking for MapCacheEntry with these values and finds the first entry
-- which belongs to transfer C.
At this point there may be following possible (bad) consequences:
1. xen_ram_addr_from_mapcache() will use a wrong entry->vaddr_base value
in this statement:
raddr = (reventry->paddr_index << MCACHE_BUCKET_SHIFT) +
((unsigned long) ptr - (unsigned long) entry->vaddr_base);
resulting in an incorrent raddr value returned from the function. The
(ptr - entry->vaddr_base) expression may produce both positive and negative
numbers and its actual value may differ greatly as there are many
map/unmap operations take place. If the value will be beyond guest RAM
limits then a "Bad RAM offset" error will be triggered and logged,
followed by exit() in QEMU.
2. If raddr value won't exceed guest RAM boundaries, the same sequence
of actions will be performed for xen_invalidate_map_cache_entry() on DMA
unmap, resulting in a wrong MapCacheEntry being unmapped while DMA
operation which uses it is still active. The above example must
be extended by one more DMA transfer in order to allow unmapping as the
first mapping in the list is sort of resident.
The patch modifies the behavior in which MapCacheEntry's are added to the
list, avoiding duplicates.
Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
|
||
|---|---|---|
| accel | ||
| audio | ||
| backends | ||
| block | ||
| bsd-user | ||
| chardev | ||
| contrib | ||
| crypto | ||
| default-configs | ||
| disas | ||
| docs | ||
| dtc@558cd81bdd | ||
| fpu | ||
| fsdev | ||
| gdb-xml | ||
| hw | ||
| include | ||
| io | ||
| libdecnumber | ||
| linux-headers | ||
| linux-user | ||
| migration | ||
| nbd | ||
| net | ||
| pc-bios | ||
| pixman@87eea99e44 | ||
| po | ||
| qapi | ||
| qga | ||
| qobject | ||
| qom | ||
| replay | ||
| roms | ||
| scripts | ||
| slirp | ||
| stubs | ||
| target | ||
| tcg | ||
| tests | ||
| trace | ||
| ui | ||
| util | ||
| .dir-locals.el | ||
| .editorconfig | ||
| .exrc | ||
| .gdbinit | ||
| .gitignore | ||
| .gitmodules | ||
| .mailmap | ||
| .shippable.yml | ||
| .travis.yml | ||
| arch_init.c | ||
| atomic_template.h | ||
| balloon.c | ||
| block.c | ||
| blockdev-nbd.c | ||
| blockdev.c | ||
| blockjob.c | ||
| bootdevice.c | ||
| bt-host.c | ||
| bt-vhci.c | ||
| Changelog | ||
| CODING_STYLE | ||
| configure | ||
| COPYING | ||
| COPYING.LIB | ||
| cpus-common.c | ||
| cpus.c | ||
| device-hotplug.c | ||
| device_tree.c | ||
| disas.c | ||
| dma-helpers.c | ||
| dump.c | ||
| exec.c | ||
| gdbstub.c | ||
| HACKING | ||
| hax-stub.c | ||
| hmp-commands-info.hx | ||
| hmp-commands.hx | ||
| hmp.c | ||
| hmp.h | ||
| ioport.c | ||
| iothread.c | ||
| LICENSE | ||
| MAINTAINERS | ||
| Makefile | ||
| Makefile.objs | ||
| Makefile.target | ||
| memory.c | ||
| memory_ldst.inc.c | ||
| memory_mapping.c | ||
| module-common.c | ||
| monitor.c | ||
| numa.c | ||
| os-posix.c | ||
| os-win32.c | ||
| qapi-schema.json | ||
| qdev-monitor.c | ||
| qdict-test-data.txt | ||
| qemu-bridge-helper.c | ||
| qemu-doc.texi | ||
| qemu-ga.texi | ||
| qemu-img-cmds.hx | ||
| qemu-img.c | ||
| qemu-img.texi | ||
| qemu-io-cmds.c | ||
| qemu-io.c | ||
| qemu-nbd.c | ||
| qemu-nbd.texi | ||
| qemu-option-trace.texi | ||
| qemu-options-wrapper.h | ||
| qemu-options.h | ||
| qemu-options.hx | ||
| qemu-seccomp.c | ||
| qemu-tech.texi | ||
| qemu.nsi | ||
| qemu.sasl | ||
| qmp.c | ||
| qtest.c | ||
| README | ||
| replication.c | ||
| replication.h | ||
| rules.mak | ||
| softmmu_template.h | ||
| thunk.c | ||
| tpm.c | ||
| trace-events | ||
| user-exec-stub.c | ||
| user-exec.c | ||
| VERSION | ||
| version.rc | ||
| vl.c | ||
QEMU README
===========
QEMU is a generic and open source machine & userspace emulator and
virtualizer.
QEMU is capable of emulating a complete machine in software without any
need for hardware virtualization support. By using dynamic translation,
it achieves very good performance. QEMU can also integrate with the Xen
and KVM hypervisors to provide emulated hardware while allowing the
hypervisor to manage the CPU. With hypervisor support, QEMU can achieve
near native performance for CPUs. When QEMU emulates CPUs directly it is
capable of running operating systems made for one machine (e.g. an ARMv7
board) on a different machine (e.g. an x86_64 PC board).
QEMU is also capable of providing userspace API virtualization for Linux
and BSD kernel interfaces. This allows binaries compiled against one
architecture ABI (e.g. the Linux PPC64 ABI) to be run on a host using a
different architecture ABI (e.g. the Linux x86_64 ABI). This does not
involve any hardware emulation, simply CPU and syscall emulation.
QEMU aims to fit into a variety of use cases. It can be invoked directly
by users wishing to have full control over its behaviour and settings.
It also aims to facilitate integration into higher level management
layers, by providing a stable command line interface and monitor API.
It is commonly invoked indirectly via the libvirt library when using
open source applications such as oVirt, OpenStack and virt-manager.
QEMU as a whole is released under the GNU General Public License,
version 2. For full licensing details, consult the LICENSE file.
Building
========
QEMU is multi-platform software intended to be buildable on all modern
Linux platforms, OS-X, Win32 (via the Mingw64 toolchain) and a variety
of other UNIX targets. The simple steps to build QEMU are:
mkdir build
cd build
../configure
make
Additional information can also be found online via the QEMU website:
http://qemu-project.org/Hosts/Linux
http://qemu-project.org/Hosts/Mac
http://qemu-project.org/Hosts/W32
Submitting patches
==================
The QEMU source code is maintained under the GIT version control system.
git clone git://git.qemu-project.org/qemu.git
When submitting patches, the preferred approach is to use 'git
format-patch' and/or 'git send-email' to format & send the mail to the
qemu-devel@nongnu.org mailing list. All patches submitted must contain
a 'Signed-off-by' line from the author. Patches should follow the
guidelines set out in the HACKING and CODING_STYLE files.
Additional information on submitting patches can be found online via
the QEMU website
http://qemu-project.org/Contribute/SubmitAPatch
http://qemu-project.org/Contribute/TrivialPatches
Bug reporting
=============
The QEMU project uses Launchpad as its primary upstream bug tracker. Bugs
found when running code built from QEMU git or upstream released sources
should be reported via:
https://bugs.launchpad.net/qemu/
If using QEMU via an operating system vendor pre-built binary package, it
is preferable to report bugs to the vendor's own bug tracker first. If
the bug is also known to affect latest upstream code, it can also be
reported via launchpad.
For additional information on bug reporting consult:
http://qemu-project.org/Contribute/ReportABug
Contact
=======
The QEMU community can be contacted in a number of ways, with the two
main methods being email and IRC
- qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel
- #qemu on irc.oftc.net
Information on additional methods of contacting the community can be
found online via the QEMU website:
http://qemu-project.org/Contribute/StartHere
-- End