qemu/include/exec
Emilio G. Cota 403f290c06 cputlb: read CPUTLBEntry.addr_write atomically
Updates can come from other threads, so readers that do not
take tlb_lock must use atomic_read to avoid undefined
behaviour (UB).

This completes the conversion to tlb_lock. This conversion results
on average in no performance loss, as the following experiments
(run on an Intel i7-6700K CPU @ 4.00GHz) show.

1. aarch64 bootup+shutdown test:

- Before:
 Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

       7487.087786      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.12% )
    31,574,905,303      cycles                    #    4.217 GHz                      ( +-  0.12% )
    57,097,908,812      instructions              #    1.81  insns per cycle          ( +-  0.08% )
    10,255,415,367      branches                  # 1369.747 M/sec                    ( +-  0.08% )
       173,278,962      branch-misses             #    1.69% of all branches          ( +-  0.18% )

       7.504481349 seconds time elapsed                                          ( +-  0.14% )

- After:
 Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

       7462.441328      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
    31,478,476,520      cycles                    #    4.218 GHz                      ( +-  0.07% )
    57,017,330,084      instructions              #    1.81  insns per cycle          ( +-  0.05% )
    10,251,929,667      branches                  # 1373.804 M/sec                    ( +-  0.05% )
       173,023,787      branch-misses             #    1.69% of all branches          ( +-  0.11% )

       7.474970463 seconds time elapsed                                          ( +-  0.07% )

2. SPEC06int:
                                              SPEC06int (test set)
                                           [Y axis: Speedup over master]
  1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+
       |                                                                                                  |
   1.1 +-+.................................+++.............................+  tlb-lock-v2 (m+++x)       +-+
       |                                +++ |                   +++        tlb-lock-v3 (spinl|ck)         |
       |                    +++          |  |     +++    +++     |                           |            |
  1.05 +-+....+++...........####.........|####.+++.|......|.....###....+++...........+++....###.........+-+
       |      ###         ++#| #         |# |# ***### +++### +++#+#     |     +++     |     #|#    ###    |
     1 +-+++***+#++++####+++#++#++++++++++#++#+*+*++#++++#+#+****+#++++###++++###++++###++++#+#++++#+#+++-+
       |    *+* #    #++# ***  #   #### ***  # * *++# ****+# *| * # ****|#   |# #    #|#    #+#    # #    |
  0.95 +-+..*.*.#....#..#.*|*..#...#..#.*|*..#.*.*..#.*|.*.#.*++*.#.*++*+#.****.#....#+#....#.#..++#.#..+-+
       |    * * #    #  # *|*  #   #  # *|*  # * *  # *++* # *  * # *  * # * |* #  ++# #    # #  *** #    |
       |    * * #  ++#  # *+*  #   #  # *|*  # * *  # *  * # *  * # *  * # *++* # **** #  ++# #  * * #    |
   0.9 +-+..*.*.#...|#..#.*.*..#.++#..#.*|*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*.|*.#...|#.#..*.*.#..+-+
       |    * * #  ***  # * *  #  |#  # *+*  # * *  # *  * # *  * # *  * # *  * # *++* #   |# #  * * #    |
  0.85 +-+..*.*.#..*|*..#.*.*..#.***..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.****.#..*.*.#..+-+
       |    * * #  *+*  # * *  # *|*  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # * |* #  * * #    |
       |    * * #  * *  # * *  # *+*  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # * |* #  * * #    |
   0.8 +-+..*.*.#..*.*..#.*.*..#.*.*..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.*++*.#..*.*.#..+-+
       |    * * #  * *  # * *  # * *  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # *  * #  * * #    |
  0.75 +-+--***##--***###-***###-***###-***###-***###-****##-****##-****##-****##-****##-****##--***##--+-+
 400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean

  png: https://imgur.com/a/BHzpPTW

Notes:
- tlb-lock-v2 corresponds to an implementation with a mutex.
- tlb-lock-v3 corresponds to the current implementation, i.e.
  a spinlock and a single lock acquisition in tlb_set_page_with_attrs.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181016153840.25877-1-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 19:46:53 -07:00
..
user linux-user: Assert on bad type in thunk_type_align() and thunk_type_size() 2018-05-24 20:46:54 +02:00
address-spaces.h Clean up header guards that don't match their file name 2016-07-12 16:19:16 +02:00
cpu-all.h tcg: Define and use new tlb_hit() and tlb_hit_page() functions 2018-07-02 08:02:20 -07:00
cpu-common.h tcg: remove tb_lock 2018-06-15 08:18:48 -10:00
cpu-defs.h cputlb: serialize tlb updates with env->tlb_lock 2018-10-18 18:58:10 -07:00
cpu_ldst.h cputlb: read CPUTLBEntry.addr_write atomically 2018-10-18 19:46:53 -07:00
cpu_ldst_template.h cputlb: read CPUTLBEntry.addr_write atomically 2018-10-18 19:46:53 -07:00
cpu_ldst_useronly_template.h linux-user: fix 32bit g2h()/h2g() 2018-08-17 13:56:33 +02:00
cputlb.h cputlb: bring back tlb_flush_count under !TLB_DEBUG 2017-10-10 07:37:10 -07:00
exec-all.h exec: introduce tlb_init 2018-10-18 18:58:10 -07:00
gdbstub.h gdbstub: Clarify what gdb_handlesig() is doing 2018-05-25 10:10:55 +02:00
gen-icount.h tcg: Pass tb and index to tcg_gen_exit_tb separately 2018-06-01 15:15:27 -07:00
helper-gen.h tcg: Allow 6 arguments to TCG helpers 2017-12-29 12:43:40 -08:00
helper-head.h tcg: Fix helper function vs host abi for float16 2018-05-31 14:50:51 +01:00
helper-proto.h tcg: Allow 6 arguments to TCG helpers 2017-12-29 12:43:40 -08:00
helper-tcg.h tcg: Allow 6 arguments to TCG helpers 2017-12-29 12:43:40 -08:00
hwaddr.h hw: Clean up includes 2016-06-07 18:19:23 +03:00
ioport.h hw: clean up hw/hw.h includes 2016-05-19 16:42:30 +02:00
log.h disas: Remove unused flags arguments 2017-10-25 11:55:09 +02:00
memattrs.h memory.h: Move MemTxResult type to memattrs.h 2017-09-04 15:21:54 +01:00
memory-internal.h tcg: remove tb_lock 2018-06-15 08:18:48 -10:00
memory.h memory: Remove old_mmio accessors 2018-10-02 19:09:14 +02:00
memory_ldst.inc.h exec: move memory access declarations to a common header, inline *_phys functions 2018-05-09 00:13:38 +02:00
memory_ldst_cached.inc.h exec: reintroduce MemoryRegion caching 2018-05-09 00:13:38 +02:00
memory_ldst_phys.inc.h exec: move memory access declarations to a common header, inline *_phys functions 2018-05-09 00:13:38 +02:00
poison.h ppc: Remove deprecated ppcemb target 2018-08-28 11:31:23 +10:00
ram_addr.h hostmem-file: add the 'pmem' option 2018-08-10 13:29:39 +03:00
ramlist.h migration: Poison ramblock loops in migration 2018-06-15 14:40:56 +01:00
semihost.h semihosting: add --semihosting-config arg sub-argument 2015-06-19 14:17:45 +01:00
softmmu-semi.h Clean up decorations and whitespace around header guards 2016-07-12 16:20:46 +02:00
target_page.h migration: Make savevm.c target independent 2017-05-18 19:21:00 +02:00
tb-context.h tcg: remove tb_lock 2018-06-15 08:18:48 -10:00
tb-hash-xx.h tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK 2017-10-24 13:53:41 -07:00
tb-hash.h tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK 2017-10-24 13:53:41 -07:00
tb-lookup.h Clean up includes 2018-02-09 05:05:11 +01:00
translator.h translator: merge max_insns into DisasContextBase 2018-05-09 10:12:21 -07:00