see http://forums.ps2dev.org/viewtopic.php?t=7539
Do we have a similar lib on the PS2 ?
Too bad that we don't have closer SDK between the PS2 and the PSP (seeing how much the PSP scene is active comparing to the PS2 one).
VFPU math lib
the toolchain script is almost the same, and apart the fact that you need a way to run unsigned code on the PS2, it's not difficult!
anyway, as soon as I will get my PS2 back (that is still travelling across europe), I'll have a closer look at it, I have a few things that would benefits of such optimizations.
evilo.
anyway, as soon as I will get my PS2 back (that is still travelling across europe), I'll have a closer look at it, I have a few things that would benefits of such optimizations.
evilo.
You can also look for inspiration in some of the VU0 macro code in libito:
http://svn.ps2dev.org/filedetails.php?r ... rev=0&sc=0
http://svn.ps2dev.org/filedetails.php?r ... rev=0&sc=0
http://svn.ps2dev.org/filedetails.php?r ... rev=0&sc=0
http://svn.ps2dev.org/filedetails.php?r ... rev=0&sc=0
If you really want to do math calculation in an efficient manner on the PS2 you need to use vu0 micromode.
What that means is that you do something like this (a basic mul example)
And then the ee code:
Sure to use vu0 in macromode (similar to vfpu on psp) you get better performance than not using vu0 at all.
But infact its even faster to use vu0 micromode even if you block on it directly afterwards because you dont need to fetch any extra instructions to the I-cache and vu0 just runs more efficient in micromode.
What that means is that you do something like this (a basic mul example)
Code: Select all
Vu0 code (or something like this)
mul vf01, vf02, vf03 nop
nop[e] nop
Code: Select all
..upload MyMathFunction to Vu0 goes here..
Then:
// set some input registers
__asm__ volatile ("lqc2 vf02,0x00(%0)\n" : : "r" (&myValue1) : "memory");
__asm__ volatile ("lqc2 vf03,0x00(%0)\n" : : "r" (&myValue2) : "memory");
// calls the vu0 program (start from address 0 in vu0)
__asm__ volatile ("vcallms 0\n");
// do something on the ee while vu0 is calculating...
// ...
// Get the result from vu0 (3 vnops to make sure the calulation has finished if it takes 4 cycles)
__asm__ volatile ("vnop\n");
__asm__ volatile ("vnop\n");
__asm__ volatile ("vnop\n");
__asm__ volatile ("sqc2 vf01,0x00(%0)\n" : : "r" (&myResult) : "memory");
But infact its even faster to use vu0 micromode even if you block on it directly afterwards because you dont need to fetch any extra instructions to the I-cache and vu0 just runs more efficient in micromode.