I have two its: VoodooSSE3 by mercurysquad and Previuos one (dunno who is the author: Mifki, Semtex, Netkas) released in 581_AMD_Intel_SSE2_SSE3_Kernel_Patcher at 27.10.2007. The last one presented in nasm sources and can be modified and compiled. VoodooSSE3 I have only in hex form.
581SSE3 is twice larger and three times slower but... it works better.
See how this forum looks with the two kernels differs only in SSE3 parts
( both kernels available here)
VoodooSSE3
Picture_2.png ( 48.05K )
Number of downloads: 96581SSE3
Pict_1.png ( 157.56K )
Number of downloads: 86As both emulators exits as hex codes I can't analyze its and make a corrections. But you see, they needed.
Is there any possibility to have the sources in form compilable by gcc 4.2?
Speed by mercurysquad's tester
VoodooSSE3
CODE
_________________________________________
Loop length = 1000
_______________________________
movdqa mem 31 reg 24
movdqu mem 31 reg 24
fisttpl mem 2176
fisttps mem 2173
fisttpq mem 2522
addsubps mem 1632 reg 1639
addsubpd mem 1664 reg 1615
hsubpd mem 1637 reg 1620
haddpd mem 1632 reg 1634
haddps mem 1626 reg 1507
hsubps mem 1627 reg 1551
movsldup mem 1603 reg 1508
movddup mem 1527 reg 1494
movshdup mem 1507 reg 1491
_______________________________
Loop length = 1000
_______________________________
movdqa mem 31 reg 24
movdqu mem 31 reg 24
fisttpl mem 2176
fisttps mem 2173
fisttpq mem 2522
addsubps mem 1632 reg 1639
addsubpd mem 1664 reg 1615
hsubpd mem 1637 reg 1620
haddpd mem 1632 reg 1634
haddps mem 1626 reg 1507
hsubps mem 1627 reg 1551
movsldup mem 1603 reg 1508
movddup mem 1527 reg 1494
movshdup mem 1507 reg 1491
_______________________________
581SSE3
CODE
_________________________________________
Loop length = 1000
_______________________________
movdqa mem 31 reg 24
movdqu mem 31 reg 24
fisttpl mem 3974
fisttps mem 3962
fisttpq mem 4113
addsubps mem 58725 reg 4073
addsubpd mem 4401 reg 4628
hsubpd mem 4271 reg 4229
haddpd mem 4714 reg 4648
haddps mem 5328 reg 4993
hsubps mem 4975 reg 4128
movsldup mem 4935 reg 4151
movddup mem 14923 reg 4111
movshdup mem 4181 reg 4035
Loop length = 1000
_______________________________
movdqa mem 31 reg 24
movdqu mem 31 reg 24
fisttpl mem 3974
fisttps mem 3962
fisttpq mem 4113
addsubps mem 58725 reg 4073
addsubpd mem 4401 reg 4628
hsubpd mem 4271 reg 4229
haddpd mem 4714 reg 4648
haddps mem 5328 reg 4993
hsubps mem 4975 reg 4128
movsldup mem 4935 reg 4151
movddup mem 14923 reg 4111
movshdup mem 4181 reg 4035
And illegal instruction without args (simple trap 0xff)
What about SupplementalSSE3 emulations?





Aug 31 2010, 08:12 PM

