Commit graph

39 commits

Author SHA1 Message Date
Alexander Graf
8fa11a4df3 target/i386: Fix legacy page table walk
Commit b56617bbcb ("target/i386: Walk NPT in guest real mode") added
logic to run the page table walker even in real mode if we are in NPT
mode.  That function then determined whether real mode or paging is
active based on whether the pg_mode variable was 0.

Unfortunately pg_mode is 0 in two situations:

  1) Paging is disabled (real mode)
  2) Paging is in 2-level paging mode (32bit without PAE)

That means the walker now assumed that 2-level paging mode was real
mode, breaking NetBSD as well as Windows XP.

To fix that, this patch adds a new PG flag to pg_mode which indicates
whether paging is active at all and uses that to determine whether we
are in real mode or not.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2654
Fixes: b56617bbcb ("target/i386: Walk NPT in guest real mode")
Signed-off-by: Alexander Graf <graf@amazon.com>
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Link: https://lore.kernel.org/r/20241106154329.67218-1-graf@amazon.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-07 16:54:02 +01:00
Paolo Bonzini
e136648c5c target/i386/tcg: Use DPL-level accesses for interrupts and call gates
Stack accesses should be explicit and use the privilege level of the
target stack.  This ensures that SMAP is not applied when the target
stack is in ring 3.

This fixes a bug wherein i386/tcg assumed that an interrupt return, or a
far call using the CALL or JMP instruction, was always going from kernel
or user mode to kernel mode when using a call gate. This assumption is
violated if the call gate has a DPL that is greater than 0.

Analyzed-by: Robert R. Henry <rrh.henry@gmail.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/249
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-17 19:41:30 +02:00
Richard Henderson
ded1db48c9 target/i386: Fix tss access size in switch_tss_ra
The two limit_max variables represent size - 1, just like the
encoding in the GDT, thus the 'old' access was off by one.
Access the minimal size of the new tss: the complete tss contains
the iopb, which may be a larger block than the access api expects,
and irrelevant because the iopb is not accessed during the
switch itself.

Fixes: 8b13106508 ("target/i386/tcg: use X86Access for TSS access")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2511
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240819074052.207783-1-richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
2024-08-21 09:11:26 +10:00
Peter Maydell
bde8adb808 target/i386: Remove dead assignment to ss in do_interrupt64()
Coverity points out that in do_interrupt64() in the "to inner
privilege" codepath we set "ss = 0", but because we also set
"new_stack = 1" there, later in the function we will always override
that value of ss with "ss = 0 | dpl".

Remove the unnecessary initialization of ss, which allows us to
reduce the scope of the variable to only where it is used.  Borrow a
comment from helper_lcall_protected() that explains what "0 | dpl"
means here.

Resolves: Coverity CID 1527395
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20240723162525.1585743-1-peter.maydell@linaro.org
2024-07-29 16:59:44 +01:00
Paolo Bonzini
6a079f2e68 target/i386/tcg: save current task state before loading new one
This is how the steps are ordered in the manual.  EFLAGS.NT is
overwritten after the fact in the saved image.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-16 18:18:25 +02:00
Paolo Bonzini
8b13106508 target/i386/tcg: use X86Access for TSS access
This takes care of probing the vaddr range in advance, and is also faster
because it avoids repeated TLB lookups.  It also matches the Intel manual
better, as it says "Checks that the current (old) TSS, new TSS, and all
segment descriptors used in the task switch are paged into system memory";
note however that it's not clear how the processor checks for segment
descriptors, and this check is not included in the AMD manual.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-16 18:18:25 +02:00
Paolo Bonzini
05d41bbcb3 target/i386/tcg: check for correct busy state before switching to a new task
This step is listed in the Intel manual: "Checks that the new task is available
(call, jump, exception, or interrupt) or busy (IRET return)".

The AMD manual lists the same operation under the "Preventing recursion"
paragraph of "12.3.4 Nesting Tasks", though it is not clear if the processor
checks the busy bit in the IRET case.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-16 18:18:24 +02:00
Paolo Bonzini
8053862af9 target/i386/tcg: Compute MMU index once
Add the MMU index to the StackAccess struct, so that it can be cached
or (in the next patch) computed from information that is not in
CPUX86State.

Co-developed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-16 18:18:24 +02:00
Richard Henderson
059368bcf5 target/i386/tcg: Reorg push/pop within seg_helper.c
Interrupts and call gates should use accesses with the DPL as
the privilege level.  While computing the applicable MMU index
is easy, the harder thing is how to plumb it in the code.

One possibility could be to add a single argument to the PUSH* macros
for the privilege level, but this is repetitive and risks confusion
between the involved privilege levels.

Another possibility is to pass both CPL and DPL, and adjusting both
PUSH* and POP* to use specific privilege levels (instead of using
cpu_{ld,st}*_data). This makes the code more symmetric.

However, a more complicated but much nicer approach is to use a structure
to contain the stack parameters, env, unwind return address, and rewrite
the macros into functions.  The struct provides an easy home for the MMU
index as well.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240617161210.4639-4-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-16 18:18:24 +02:00
Paolo Bonzini
312ef3243e target/i386/tcg: use PUSHL/PUSHW for error code
Do not pre-decrement esp, let the macros subtract the appropriate
operand size.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-16 18:18:24 +02:00
Paolo Bonzini
0bd385e7e3 target/i386/tcg: Allow IRET from user mode to user mode with SMAP
This fixes a bug wherein i386/tcg assumed an interrupt return using
the IRET instruction was always returning from kernel mode to either
kernel mode or user mode. This assumption is violated when IRET is used
as a clever way to restore thread state, as for example in the dotnet
runtime. There, IRET returns from user mode to user mode.

This bug is that stack accesses from IRET and RETF, as well as accesses
to the parameters in a call gate, are normal data accesses using the
current CPL.  This manifested itself as a page fault in the guest Linux
kernel due to SMAP preventing the access.

This bug appears to have been in QEMU since the beginning.

Analyzed-by: Robert R. Henry <rrh.henry@gmail.com>
Co-developed-by: Robert R. Henry <rrh.henry@gmail.com>
Signed-off-by: Robert R. Henry <rrh.henry@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-16 18:18:24 +02:00
Richard Henderson
a7cf494993 target/i386/tcg: Remove SEG_ADDL
This truncation is now handled by MMU_*32_IDX.  The introduction of
MMU_*32_IDX in fact applied correct 32-bit wraparound to 16-bit accesses
with a high segment base (e.g.  big real mode or vm86 mode), which did
not use SEG_ADDL.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240617161210.4639-3-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-16 18:18:24 +02:00
Paolo Bonzini
ae541c0eb4 target/i386: convert non-grouped, helper-based 2-byte opcodes
These have very simple generators and no need for complex group
decoding.  Apart from LAR/LSL which are simplified to use
gen_op_deposit_reg_v and movcond, the code is generally lifted
from translate.c into the generators.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-17 09:47:39 +02:00
Paolo Bonzini
69cb498c56 target/i386: fix pushed value of EFLAGS.RF
When preparing an exception stack frame for a fault exception, the value
pushed for RF is 1.  Take that into account.  The same should be true
of interrupts for repeated string instructions, but the situation there
is complicated.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-08 10:33:37 +02:00
Paolo Bonzini
abdcc5c8ef target/i386: set CC_OP in helpers if they want CC_OP_EFLAGS
Mark cc_op as clean and do not spill it at the end of the translation block.
Technically this is a tiny bit less efficient, but:

* it results in translations that are a tiny bit smaller

* for most of these instructions, it is not unlikely that they are close to
the end of the basic block, in which case cc_op would not be overwritten

* anyway the cost is probably dwarfed by that of computing flags.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25 13:28:01 +02:00
Paolo Bonzini
2455e9cf5a target/i386: clean up cpu_cc_compute_all
cpu_cc_compute_all() has an argument that is always equal to CC_OP for historical
reasons (dating back to commit a7812ae412, "TCG variable type checking.", 2008-11-17,
which added the argument to helper_cc_compute_all).  It does not make sense for the
argument to have any other value, so remove it and clean up some lines that are not
too long anymore.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:03:02 +01:00
Paolo Bonzini
c35b2fb1fd target/i386: fix shadowed variable pasto
Commit a908985971 ("target/i386/seg_helper: introduce tss_set_busy",
2023-09-26) failed to use the tss_selector argument of the new function,
which was therefore unused.

This shows up as a #GP fault when booting old versions of 32-bit
Linux.

Fixes: a908985971 ("target/i386/seg_helper: introduce tss_set_busy", 2023-09-26)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20231011135350.438492-1-pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2023-10-12 16:37:31 +02:00
Paolo Bonzini
49958057a2 target/i386/seg_helper: remove shadowed variable
Return the width of the new task directly from switch_tss_ra.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-26 16:41:49 +02:00
Paolo Bonzini
a908985971 target/i386/seg_helper: introduce tss_set_busy
Eliminate a shadowed local variable in the process.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-26 16:41:49 +02:00
Paolo Bonzini
63fd8ef080 target/i386: implement SYSCALL/SYSRET in 32-bit emulators
AMD supports both 32-bit and 64-bit SYSCALL/SYSRET, but the TCG only
exposes it for 64-bit targets.  For system emulation just reuse the
helper; for user-mode emulation the ABI is the same as "int $80".

The BSDs does not support any fast system call mechanism in 32-bit
mode so add to bsd-user the same stub that FreeBSD has for 64-bit
compatibility mode.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-06-26 10:23:56 +02:00
Joe Richey
b585edca34 i386: Emit correct error code for 64-bit IDT entry
When in 64-bit mode, IDT entiries are 16 bytes, so `intno * 16` is used
for base/limit/offset calculations. However, even in 64-bit mode, the
exception error code still uses bits [3,16) for the invlaid interrupt
index.

This means the error code should still be `intno * 8 + 2` even in 64-bit
mode.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1382
Signed-off-by: Joe Richey <joerichey@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11 09:59:38 +01:00
Richard Henderson
8c03ab9f74 target/i386: Truncate values for lcall_real to i32
Use i32 not int or tl for eip and cs arguments.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221001140935.465607-15-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-11 09:36:01 +02:00
Gareth Webb
50fcc7cbb6 target/i386: Throw a #SS when loading a non-canonical IST
Loading a non-canonical address into rsp when handling an interrupt or
performing a far call should raise a #SS not a #GP.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/870
Signed-off-by: Gareth Webb <gareth.webb@umbralsoftware.co.uk>
Message-Id: <164529651121.25406.15337137068584246397-0@git.sr.ht>
[Move get_pg_mode to seg_helper.c for user-mode emulators. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15 11:50:15 +01:00
Philippe Mathieu-Daudé
0792e6c88d target/i386: Move x86_cpu_exec_interrupt() under sysemu/ folder
Following the logic of commit 30493a030f ("i386: split seg_helper
into user-only and sysemu parts"), move x86_cpu_exec_interrupt()
under sysemu/seg_helper.c.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-By: Warner Losh <imp@bsdimp.com>
Message-Id: <20210911165434.531552-12-f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14 12:00:21 -07:00
Philippe Mathieu-Daudé
604664726f target/i386: Restrict cpu_exec_interrupt() handler to sysemu
Restrict cpu_exec_interrupt() and its callees to sysemu.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210911165434.531552-11-f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14 12:00:21 -07:00
Philippe Mathieu-Daudé
30ca39244b target/i386: Simplify TARGET_X86_64 #ifdef'ry
Merge two TARGET_X86_64 consecutive blocks.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210911165434.531552-4-f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14 12:00:20 -07:00
Lara Lazier
e3126a5c92 target/i386: Moved int_ctl into CPUX86State structure
Moved int_ctl into the CPUX86State structure.  It removes some
unnecessary stores and loads, and prepares for tracking the vIRQ
state even when it is masked due to vGIF.

Signed-off-by: Lara Lazier <laramglazier@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-13 13:56:26 +02:00
Paolo Bonzini
1b627f389f target/i386: tcg: fix switching from 16-bit to 32-bit tasks or vice versa
The format of the task state segment is governed by bit 3 in the
descriptor type field.  On a task switch, the format for saving
is given by the current value of TR's type field, while the
format for loading is given by the new descriptor.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-04 13:47:08 +02:00
Paolo Bonzini
a5505f6b5b target/i386: tcg: fix loading of registers from 16-bit TSS
According to the manual, the high 16-bit of the registers are preserved
when switching to a 16-bit task.  Implement this in switch_tss_ra.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-04 13:47:08 +02:00
Paolo Bonzini
28f6aa1178 target/i386: tcg: fix segment register offsets for 16-bit TSS
The TSS offsets in the manuals have only 2-byte slots for the
segment registers.  QEMU incorrectly uses 4-byte slots, so
that SS overlaps the LDT selector.

Resolves: #382
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-04 13:47:08 +02:00
Richard Henderson
d76b9c6f07 target/i386: Move helper_check_io to sysemu
The we never allow i/o from user-only, and the tss check
that helper_check_io does will always fail.  Use an ifdef
within gen_check_io and return false, indicating that an
exception is known to be raised.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210514151342.384376-50-richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-05-19 12:17:11 -05:00
Richard Henderson
e497803556 target/i386: Create helper_check_io
Drop helper_check_io[bwl] and expose their common
subroutine to tcg directly.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210514151342.384376-49-richard.henderson@linaro.org>
2021-05-19 12:17:11 -05:00
Claudio Fontana
30493a030f i386: split seg_helper into user-only and sysemu parts
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

[claudio]:
Rebased on commit 68775856 ("target/i386: svm: do not discard high 32 bits")

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210322132800.7470-18-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-10 15:41:52 -04:00
Claudio Fontana
a93b55ec22 i386: split smm helper (sysemu)
smm is only really useful for sysemu, split in two modules
around the CONFIG_USER_ONLY, in order to remove the ifdef
and use the build system instead.

add cpu_abort() when detecting attempts to enter SMM mode via
SMI interrupt in user-mode, and assert that the cpu is not
in SMM mode while translating RSM instructions.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210322132800.7470-12-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-10 15:41:51 -04:00
Paolo Bonzini
687758565a target/i386: svm: do not discard high 32 bits of EXITINFO1
env->error_code is only 32-bits wide, so the high 32 bits of EXITINFO1
are being lost.  However, even though saving guest state and restoring
host state must be delayed to do_vmexit, because they might take tb_lock,
it is always possible to write to the VMCB.  So do this for the exit
code and EXITINFO1, just like it is already being done for EXITINFO2.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-19 08:48:18 -04:00
Philippe Mathieu-Daudé
c117e5b11a target/i386: Use X86Seg enum for segment registers
Use the dedicated X86Seg enum type for segment registers.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210109233427.749748-1-f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-12 17:05:10 +01:00
Peter Maydell
3df1a3d070 target/i386: Check privilege level for protected mode 'int N' task gate
When the 'int N' instruction is executed in protected mode, the
pseudocode in the architecture manual specifies that we need to check:

 * vector number within IDT limits
 * selected IDT descriptor is a valid type (interrupt, trap or task gate)
 * if this was a software interrupt then gate DPL < CPL

The way we had structured the code meant that the privilege check for
software interrupts ended up not in the code path taken for task gate
handling, because all of the task gate handling code was in the 'case 5'
of the switch which was checking "is this descriptor a valid type".

Move the task gate handling code out of that switch (so that it is now
purely doing the "valid type?" check) and below the software interrupt
privilege check.

The effect of this missing check was that in a guest userspace binary
executing 'int 8' would cause a guest kernel panic rather than the
userspace binary being handed a SEGV.

This is essentially the same bug fixed in VirtualBox in 2012:
https://www.halfdog.net/Security/2012/VirtualBoxSoftwareInterrupt0x8GuestCrash/

Note that for QEMU this is not a security issue because it is only
present when using TCG.

Fixes: https://bugs.launchpad.net/qemu/+bug/1813201
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20201121224445.16236-1-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-02 21:03:09 +01:00
Claudio Fontana
ed69e8314d i386: move TCG cpu class initialization to tcg/
to do this, we need to take code out of cpu.c and helper.c,
and also move some prototypes from cpu.h, for code that is
needed in tcg/xxx_helper.c, and which in turn is part of the
callbacks registered by the class initialization.

Therefore, do some shuffling of the parts of cpu.h that
are only relevant for tcg/, and put them in tcg/helper-tcg.h

For FT0 and similar macros, put them in tcg/fpu-helper.c
since they are used only there.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201212155530.23098-8-cfontana@suse.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-16 15:50:33 -05:00
Claudio Fontana
1b248f147e i386: move TCG accel files into tcg/
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

[claudio: moved cc_helper_template.h to tcg/ too]

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20201212155530.23098-6-cfontana@suse.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-16 14:06:53 -05:00
Renamed from target/i386/seg_helper.c (Browse further)