target/ppc: Moved VMLADDUHM to decodetree and use gvec

This patch moves VMLADDUHM to decodetree a creates a gvec implementation
using mul_vec and add_vec.

rept    loop    master             patch
8       12500   0,01810500         0,00903100 (-50.1%)
25      4000    0,01739400         0,00747700 (-57.0%)
100     1000    0,01843600         0,00901400 (-51.1%)
500     200     0,02574600         0,01971000 (-23.4%)
2500    40      0,05921600         0,07121800 (+20.3%)
8000    12      0,15326700         0,21725200 (+41.7%)

The significant difference in performance when REPT is low and LOOP is
high I think is due to the fact that the new implementation has a higher
translation time, as when using a helper only 5 TCGop are used but with
the patch a total of 10 TCGop are needed (Power lacks a direct mul_vec
equivalent so this instruction is implemented with the help of 5 others,
vmuleu, vmulou, vmrgh, vmrgl and vpkum).

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-2-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This commit is contained in:
Lucas Mateus Castro (alqotel) 2022-10-19 09:50:29 -03:00 committed by Daniel Henrique Barboza
parent b35bf5f2d7
commit dc46167a22
5 changed files with 35 additions and 21 deletions

View file

@ -264,7 +264,7 @@ DEF_HELPER_FLAGS_4(VMSUMUHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr)
DEF_HELPER_5(VMSUMUHS, void, env, avr, avr, avr, avr)
DEF_HELPER_FLAGS_4(VMSUMSHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr)
DEF_HELPER_5(VMSUMSHS, void, env, avr, avr, avr, avr)
DEF_HELPER_FLAGS_4(vmladduhm, TCG_CALL_NO_RWG, void, avr, avr, avr, avr)
DEF_HELPER_FLAGS_5(VMLADDUHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
DEF_HELPER_FLAGS_2(mtvscr, TCG_CALL_NO_RWG, void, env, i32)
DEF_HELPER_FLAGS_1(mfvscr, TCG_CALL_NO_RWG, i32, env)
DEF_HELPER_3(lvebx, void, env, avr, tl)