target-i386: add SSE4a instruction support

This adds support for the AMD Phenom/Barcelona's SSE4a instructions.
Those include insertq and extrq, which are doing shift and mask on
XMM registers, in two versions (immediate shift/length values and
stored in another XMM register).
Additionally it implements movntss, movntsd, which are scalar
non-temporal stores (avoiding cache trashing). These are implemented
as normal stores, though.
SSE4a is guarded by the SSE4A CPUID bit (Fn8000_0001:ECX[6]).

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This commit is contained in:
Andre Przywara 2009-09-19 00:30:48 +02:00 committed by Aurelien Jarno
parent ccd59d09a9
commit d9f4bb27db
3 changed files with 85 additions and 2 deletions

View file

@ -187,6 +187,10 @@ DEF_HELPER_2(rsqrtps, void, XMMReg, XMMReg)
DEF_HELPER_2(rsqrtss, void, XMMReg, XMMReg)
DEF_HELPER_2(rcpps, void, XMMReg, XMMReg)
DEF_HELPER_2(rcpss, void, XMMReg, XMMReg)
DEF_HELPER_2(extrq_r, void, XMMReg, XMMReg)
DEF_HELPER_3(extrq_i, void, XMMReg, int, int)
DEF_HELPER_2(insertq_r, void, XMMReg, XMMReg)
DEF_HELPER_3(insertq_i, void, XMMReg, int, int)
DEF_HELPER_2(haddps, void, XMMReg, XMMReg)
DEF_HELPER_2(haddpd, void, XMMReg, XMMReg)
DEF_HELPER_2(hsubps, void, XMMReg, XMMReg)