mirror of
https://github.com/Motorhead1991/qemu.git
synced 2025-08-03 07:43:54 -06:00
* Record-replay lockstep execution, log dumper and fixes (Alex, Pavel)
* SCSI fix to pass maximum transfer size (Daniel Barboza) * chardev fixes and improved iothread support (Daniel Berrangé, Peter) * checkpatch tweak (Eric) * make help tweak (Marc-André) * make more PCI NICs available with -net or -nic (myself) * change default q35 NIC to e1000e (myself) * SCSI support for NDOB bit (myself) * membarrier system call support (myself) * SuperIO refactoring (Philippe) * miscellaneous cleanups and fixes (Thomas) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJapqaMAAoJEL/70l94x66DQoUH/Rvg+a8giz/SrEA4P8D3Cb2z 4GNbNUUoy4oU0ltD5IAMskMwpOsvl1batE0D+pKIlfO9NV4+Cj2kpgo0p9TxoYqM VCby3wRtx27zb5nVytC6M++iIKXmeEMqXmFw61I6umddNPSl4IR3hiHEE0DM+7dV UPIOvJeEiazyQaw3Iw+ZctNn8dDBKc/+6oxP9xRcYTaZ6hB4G9RZkqGNNSLcJkk7 R0UotdjzIZhyWMOkjIwlpTF4sWv8gsYUV4bPYKMYho5B0Obda2dBM3I1kpA8yDa/ xZ5lheOaAVBZvM5aMIcaQPa65MO9hLyXFmhMOgyfpJhLBBz6Qpa4OLLI6DeTN+0= =UAgA -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * Record-replay lockstep execution, log dumper and fixes (Alex, Pavel) * SCSI fix to pass maximum transfer size (Daniel Barboza) * chardev fixes and improved iothread support (Daniel Berrangé, Peter) * checkpatch tweak (Eric) * make help tweak (Marc-André) * make more PCI NICs available with -net or -nic (myself) * change default q35 NIC to e1000e (myself) * SCSI support for NDOB bit (myself) * membarrier system call support (myself) * SuperIO refactoring (Philippe) * miscellaneous cleanups and fixes (Thomas) # gpg: Signature made Mon 12 Mar 2018 16:10:52 GMT # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (69 commits) tcg: fix cpu_io_recompile replay: update documentation replay: save vmstate of the asynchronous events replay: don't process async events when warping the clock scripts/replay-dump.py: replay log dumper replay: avoid recursive call of checkpoints replay: check return values of fwrite replay: push replay_mutex_lock up the call tree replay: don't destroy mutex at exit replay: make locking visible outside replay code replay/replay-internal.c: track holding of replay_lock replay/replay.c: bump REPLAY_VERSION again replay: save prior value of the host clock replay: added replay log format description replay: fix save/load vm for non-empty queue replay: fixed replay_enable_events replay: fix processing async events cpu-exec: fix exception_index handling hw/i386/pc: Factor out the superio code hw/alpha/dp264: Use the TYPE_SMC37C669_SUPERIO ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # default-configs/i386-softmmu.mak # default-configs/x86_64-softmmu.mak
This commit is contained in:
commit
3788c7b6e5
117 changed files with 1952 additions and 728 deletions
|
@ -122,20 +122,30 @@ In general, if the algorithm you are writing includes both writes
|
|||
and reads on the same side, it is generally simpler to use sequentially
|
||||
consistent primitives.
|
||||
|
||||
When using this model, variables are accessed with atomic_read() and
|
||||
atomic_set(), and restrictions to the ordering of accesses is enforced
|
||||
When using this model, variables are accessed with:
|
||||
|
||||
- atomic_read() and atomic_set(); these prevent the compiler from
|
||||
optimizing accesses out of existence and creating unsolicited
|
||||
accesses, but do not otherwise impose any ordering on loads and
|
||||
stores: both the compiler and the processor are free to reorder
|
||||
them.
|
||||
|
||||
- atomic_load_acquire(), which guarantees the LOAD to appear to
|
||||
happen, with respect to the other components of the system,
|
||||
before all the LOAD or STORE operations specified afterwards.
|
||||
Operations coming before atomic_load_acquire() can still be
|
||||
reordered after it.
|
||||
|
||||
- atomic_store_release(), which guarantees the STORE to appear to
|
||||
happen, with respect to the other components of the system,
|
||||
after all the LOAD or STORE operations specified afterwards.
|
||||
Operations coming after atomic_store_release() can still be
|
||||
reordered after it.
|
||||
|
||||
Restrictions to the ordering of accesses can also be specified
|
||||
using the memory barrier macros: smp_rmb(), smp_wmb(), smp_mb(),
|
||||
smp_mb_acquire(), smp_mb_release(), smp_read_barrier_depends().
|
||||
|
||||
atomic_read() and atomic_set() prevents the compiler from using
|
||||
optimizations that might otherwise optimize accesses out of existence
|
||||
on the one hand, or that might create unsolicited accesses on the other.
|
||||
In general this should not have any effect, because the same compiler
|
||||
barriers are already implied by memory barriers. However, it is useful
|
||||
to do so, because it tells readers which variables are shared with
|
||||
other threads, and which are local to the current thread or protected
|
||||
by other, more mundane means.
|
||||
|
||||
Memory barriers control the order of references to shared memory.
|
||||
They come in six kinds:
|
||||
|
||||
|
@ -232,7 +242,7 @@ make atomic_mb_set() the more expensive operation.
|
|||
|
||||
There are two common cases in which atomic_mb_read and atomic_mb_set
|
||||
generate too many memory barriers, and thus it can be useful to manually
|
||||
place barriers instead:
|
||||
place barriers, or use atomic_load_acquire/atomic_store_release instead:
|
||||
|
||||
- when a data structure has one thread that is always a writer
|
||||
and one thread that is always a reader, manual placement of
|
||||
|
@ -243,18 +253,15 @@ place barriers instead:
|
|||
thread 1 thread 1
|
||||
------------------------- ------------------------
|
||||
(other writes)
|
||||
smp_mb_release()
|
||||
atomic_mb_set(&a, x) atomic_set(&a, x)
|
||||
smp_wmb()
|
||||
atomic_mb_set(&b, y) atomic_set(&b, y)
|
||||
atomic_mb_set(&a, x) atomic_store_release(&a, x)
|
||||
atomic_mb_set(&b, y) atomic_store_release(&b, y)
|
||||
|
||||
=>
|
||||
thread 2 thread 2
|
||||
------------------------- ------------------------
|
||||
y = atomic_mb_read(&b) y = atomic_read(&b)
|
||||
smp_rmb()
|
||||
x = atomic_mb_read(&a) x = atomic_read(&a)
|
||||
smp_mb_acquire()
|
||||
y = atomic_mb_read(&b) y = atomic_load_acquire(&b)
|
||||
x = atomic_mb_read(&a) x = atomic_load_acquire(&a)
|
||||
(other reads)
|
||||
|
||||
Note that the barrier between the stores in thread 1, and between
|
||||
the loads in thread 2, has been optimized here to a write or a
|
||||
|
@ -276,7 +283,6 @@ place barriers instead:
|
|||
smp_mb_acquire();
|
||||
|
||||
Similarly, atomic_mb_set() can be transformed as follows:
|
||||
smp_mb():
|
||||
|
||||
smp_mb_release();
|
||||
for (i = 0; i < 10; i++) => for (i = 0; i < 10; i++)
|
||||
|
@ -284,6 +290,8 @@ place barriers instead:
|
|||
smp_mb();
|
||||
|
||||
|
||||
The other thread can still use atomic_mb_read()/atomic_mb_set().
|
||||
|
||||
The two tricks can be combined. In this case, splitting a loop in
|
||||
two lets you hoist the barriers out of the loops _and_ eliminate the
|
||||
expensive smp_mb():
|
||||
|
@ -296,8 +304,6 @@ expensive smp_mb():
|
|||
atomic_set(&a[i], false);
|
||||
smp_mb();
|
||||
|
||||
The other thread can still use atomic_mb_read()/atomic_mb_set()
|
||||
|
||||
|
||||
Memory barrier pairing
|
||||
----------------------
|
||||
|
@ -386,10 +392,7 @@ and memory barriers, and the equivalents in QEMU:
|
|||
note that smp_store_mb() is a little weaker than atomic_mb_set().
|
||||
atomic_mb_read() compiles to the same instructions as Linux's
|
||||
smp_load_acquire(), but this should be treated as an implementation
|
||||
detail. QEMU does have atomic_load_acquire() and atomic_store_release()
|
||||
macros, but for now they are only used within atomic.h. This may
|
||||
change in the future.
|
||||
|
||||
detail.
|
||||
|
||||
SOURCES
|
||||
=======
|
||||
|
|
161
docs/replay.txt
161
docs/replay.txt
|
@ -7,14 +7,10 @@ See the COPYING file in the top-level directory.
|
|||
Record/replay
|
||||
-------------
|
||||
|
||||
Record/replay functions are used for the reverse execution and deterministic
|
||||
replay of qemu execution. This implementation of deterministic replay can
|
||||
be used for deterministic debugging of guest code through a gdb remote
|
||||
interface.
|
||||
|
||||
Record/replay functions are used for the deterministic replay of qemu execution.
|
||||
Execution recording writes a non-deterministic events log, which can be later
|
||||
used for replaying the execution anywhere and for unlimited number of times.
|
||||
It also supports checkpointing for faster rewinding during reverse debugging.
|
||||
It also supports checkpointing for faster rewind to the specific replay moment.
|
||||
Execution replaying reads the log and replays all non-deterministic events
|
||||
including external input, hardware clocks, and interrupts.
|
||||
|
||||
|
@ -28,16 +24,36 @@ Deterministic replay has the following features:
|
|||
input devices.
|
||||
|
||||
Usage of the record/replay:
|
||||
* First, record the execution, by adding the following arguments to the command line:
|
||||
'-icount shift=7,rr=record,rrfile=replay.bin -net none'.
|
||||
Block devices' images are not actually changed in the recording mode,
|
||||
* First, record the execution with the following command line:
|
||||
qemu-system-i386 \
|
||||
-icount shift=7,rr=record,rrfile=replay.bin \
|
||||
-drive file=disk.qcow2,if=none,id=img-direct \
|
||||
-drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \
|
||||
-device ide-hd,drive=img-blkreplay \
|
||||
-netdev user,id=net1 -device rtl8139,netdev=net1 \
|
||||
-object filter-replay,id=replay,netdev=net1
|
||||
* After recording, you can replay it by using another command line:
|
||||
qemu-system-i386 \
|
||||
-icount shift=7,rr=replay,rrfile=replay.bin \
|
||||
-drive file=disk.qcow2,if=none,id=img-direct \
|
||||
-drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \
|
||||
-device ide-hd,drive=img-blkreplay \
|
||||
-netdev user,id=net1 -device rtl8139,netdev=net1 \
|
||||
-object filter-replay,id=replay,netdev=net1
|
||||
The only difference with recording is changing the rr option
|
||||
from record to replay.
|
||||
* Block device images are not actually changed in the recording mode,
|
||||
because all of the changes are written to the temporary overlay file.
|
||||
* Then you can replay it by using another command
|
||||
line option: '-icount shift=7,rr=replay,rrfile=replay.bin -net none'
|
||||
* '-net none' option should also be specified if network replay patches
|
||||
are not applied.
|
||||
This behavior is enabled by using blkreplay driver. It should be used
|
||||
for every enabled block device, as described in 'Block devices' section.
|
||||
* '-net none' option should be specified when network is not used,
|
||||
because QEMU adds network card by default. When network is needed,
|
||||
it should be configured explicitly with replay filter, as described
|
||||
in 'Network devices' section.
|
||||
* Interaction with audio devices and serial ports are recorded and replayed
|
||||
automatically when such devices are enabled.
|
||||
|
||||
Papers with description of deterministic replay implementation:
|
||||
Academic papers with description of deterministic replay implementation:
|
||||
http://www.computer.org/csdl/proceedings/csmr/2012/4666/00/4666a553-abs.html
|
||||
http://dl.acm.org/citation.cfm?id=2786805.2803179
|
||||
|
||||
|
@ -46,8 +62,33 @@ Modifications of qemu include:
|
|||
* saving different asynchronous events (e.g. system shutdown) into the log
|
||||
* synchronization of the bottom halves execution
|
||||
* synchronization of the threads from thread pool
|
||||
* recording/replaying user input (mouse and keyboard)
|
||||
* recording/replaying user input (mouse, keyboard, and microphone)
|
||||
* adding internal checkpoints for cpu and io synchronization
|
||||
* network filter for recording and replaying the packets
|
||||
* block driver for making block layer deterministic
|
||||
* serial port input record and replay
|
||||
|
||||
Locking and thread synchronisation
|
||||
----------------------------------
|
||||
|
||||
Previously the synchronisation of the main thread and the vCPU thread
|
||||
was ensured by the holding of the BQL. However the trend has been to
|
||||
reduce the time the BQL was held across the system including under TCG
|
||||
system emulation. As it is important that batches of events are kept
|
||||
in sequence (e.g. expiring timers and checkpoints in the main thread
|
||||
while instruction checkpoints are written by the vCPU thread) we need
|
||||
another lock to keep things in lock-step. This role is now handled by
|
||||
the replay_mutex_lock. It used to be held only for each event being
|
||||
written but now it is held for a whole execution period. This results
|
||||
in a deterministic ping-pong between the two main threads.
|
||||
|
||||
As the BQL is now a finer grained lock than the replay_lock it is almost
|
||||
certainly a bug, and a source of deadlocks, to take the
|
||||
replay_mutex_lock while the BQL is held. This is enforced by an assert.
|
||||
While the unlocks are usually in the reverse order, this is not
|
||||
necessary; you can drop the replay_lock while holding the BQL, without
|
||||
doing a more complicated unlock_iothread/replay_unlock/lock_iothread
|
||||
sequence.
|
||||
|
||||
Non-deterministic events
|
||||
------------------------
|
||||
|
@ -55,12 +96,11 @@ Non-deterministic events
|
|||
Our record/replay system is based on saving and replaying non-deterministic
|
||||
events (e.g. keyboard input) and simulating deterministic ones (e.g. reading
|
||||
from HDD or memory of the VM). Saving only non-deterministic events makes
|
||||
log file smaller, simulation faster, and allows using reverse debugging even
|
||||
for realtime applications.
|
||||
log file smaller and simulation faster.
|
||||
|
||||
The following non-deterministic data from peripheral devices is saved into
|
||||
the log: mouse and keyboard input, network packets, audio controller input,
|
||||
USB packets, serial port input, and hardware clocks (they are non-deterministic
|
||||
serial port input, and hardware clocks (they are non-deterministic
|
||||
too, because their values are taken from the host machine). Inputs from
|
||||
simulated hardware, memory of VM, software interrupts, and execution of
|
||||
instructions are not saved into the log, because they are deterministic and
|
||||
|
@ -183,7 +223,7 @@ Block devices record/replay module intercepts calls of
|
|||
bdrv coroutine functions at the top of block drivers stack.
|
||||
To record and replay block operations the drive must be configured
|
||||
as following:
|
||||
-drive file=disk.qcow,if=none,id=img-direct
|
||||
-drive file=disk.qcow2,if=none,id=img-direct
|
||||
-drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay
|
||||
-device ide-hd,drive=img-blkreplay
|
||||
|
||||
|
@ -212,6 +252,12 @@ This snapshot is created at start of recording and restored at start
|
|||
of replaying. It also can be loaded while replaying to roll back
|
||||
the execution.
|
||||
|
||||
Use QEMU monitor to create additional snapshots. 'savevm <name>' command
|
||||
created the snapshot and 'loadvm <name>' restores it. To prevent corruption
|
||||
of the original disk image, use overlay files linked to the original images.
|
||||
Therefore all new snapshots (including the starting one) will be saved in
|
||||
overlays and the original image remains unchanged.
|
||||
|
||||
Network devices
|
||||
---------------
|
||||
|
||||
|
@ -232,3 +278,80 @@ Audio devices
|
|||
Audio data is recorded and replay automatically. The command line for recording
|
||||
and replaying must contain identical specifications of audio hardware, e.g.:
|
||||
-soundhw ac97
|
||||
|
||||
Serial ports
|
||||
------------
|
||||
|
||||
Serial ports input is recorded and replay automatically. The command lines
|
||||
for recording and replaying must contain identical number of ports in record
|
||||
and replay modes, but their backends may differ.
|
||||
E.g., '-serial stdio' in record mode, and '-serial null' in replay mode.
|
||||
|
||||
Replay log format
|
||||
-----------------
|
||||
|
||||
Record/replay log consits of the header and the sequence of execution
|
||||
events. The header includes 4-byte replay version id and 8-byte reserved
|
||||
field. Version is updated every time replay log format changes to prevent
|
||||
using replay log created by another build of qemu.
|
||||
|
||||
The sequence of the events describes virtual machine state changes.
|
||||
It includes all non-deterministic inputs of VM, synchronization marks and
|
||||
instruction counts used to correctly inject inputs at replay.
|
||||
|
||||
Synchronization marks (checkpoints) are used for synchronizing qemu threads
|
||||
that perform operations with virtual hardware. These operations may change
|
||||
system's state (e.g., change some register or generate interrupt) and
|
||||
therefore should execute synchronously with CPU thread.
|
||||
|
||||
Every event in the log includes 1-byte event id and optional arguments.
|
||||
When argument is an array, it is stored as 4-byte array length
|
||||
and corresponding number of bytes with data.
|
||||
Here is the list of events that are written into the log:
|
||||
|
||||
- EVENT_INSTRUCTION. Instructions executed since last event.
|
||||
Argument: 4-byte number of executed instructions.
|
||||
- EVENT_INTERRUPT. Used to synchronize interrupt processing.
|
||||
- EVENT_EXCEPTION. Used to synchronize exception handling.
|
||||
- EVENT_ASYNC. This is a group of events. They are always processed
|
||||
together with checkpoints. When such an event is generated, it is
|
||||
stored in the queue and processed only when checkpoint occurs.
|
||||
Every such event is followed by 1-byte checkpoint id and 1-byte
|
||||
async event id from the following list:
|
||||
- REPLAY_ASYNC_EVENT_BH. Bottom-half callback. This event synchronizes
|
||||
callbacks that affect virtual machine state, but normally called
|
||||
asyncronously.
|
||||
Argument: 8-byte operation id.
|
||||
- REPLAY_ASYNC_EVENT_INPUT. Input device event. Contains
|
||||
parameters of keyboard and mouse input operations
|
||||
(key press/release, mouse pointer movement).
|
||||
Arguments: 9-16 bytes depending of input event.
|
||||
- REPLAY_ASYNC_EVENT_INPUT_SYNC. Internal input synchronization event.
|
||||
- REPLAY_ASYNC_EVENT_CHAR_READ. Character (e.g., serial port) device input
|
||||
initiated by the sender.
|
||||
Arguments: 1-byte character device id.
|
||||
Array with bytes were read.
|
||||
- REPLAY_ASYNC_EVENT_BLOCK. Block device operation. Used to synchronize
|
||||
operations with disk and flash drives with CPU.
|
||||
Argument: 8-byte operation id.
|
||||
- REPLAY_ASYNC_EVENT_NET. Incoming network packet.
|
||||
Arguments: 1-byte network adapter id.
|
||||
4-byte packet flags.
|
||||
Array with packet bytes.
|
||||
- EVENT_SHUTDOWN. Occurs when user sends shutdown event to qemu,
|
||||
e.g., by closing the window.
|
||||
- EVENT_CHAR_WRITE. Used to synchronize character output operations.
|
||||
Arguments: 4-byte output function return value.
|
||||
4-byte offset in the output array.
|
||||
- EVENT_CHAR_READ_ALL. Used to synchronize character input operations,
|
||||
initiated by qemu.
|
||||
Argument: Array with bytes that were read.
|
||||
- EVENT_CHAR_READ_ALL_ERROR. Unsuccessful character input operation,
|
||||
initiated by qemu.
|
||||
Argument: 4-byte error code.
|
||||
- EVENT_CLOCK + clock_id. Group of events for host clock read operations.
|
||||
Argument: 8-byte clock value.
|
||||
- EVENT_CHECKPOINT + checkpoint_id. Checkpoint for synchronization of
|
||||
CPU, internal threads, and asynchronous input events. May be followed
|
||||
by one or more EVENT_ASYNC events.
|
||||
- EVENT_END. Last event in the log.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue