hi - i'm looking for some pointers on how to optimize code written for the PSP. while i am a failrly experienced programmer, i haven't done much work with gcc or any work with the MIPS core. specifically:
1. are there any gcc build switches i really should/shouldn't use? right now i'm using '-O3 -finline-functions-called-once -finline-functions -floop-optimize2'. this gives about a 50% increase in performance over not using any of these switches. most of the gains seem to come form -O3. what am i missing?
2. are there any examples of PSP programs that mix C and assembly? where would i find doc on the C run time model? any recommended tutorials on R4000 asm? i'd really like to port the linear interpolator to asm but i don't know how to progress down that path.
3. i've found that when i cross some unknown barrier in code/data size, performance degrades significantly. i'm guessing that there's some caching issues to work around. does anyone know how to layout code & data in memory for optimal performance? are there fast memories where i should put frequently accessed buffers & modules/if so, what's the best way of getting the linker to place the modules there?
4. any general thoughts on fixed vs floating point? most of my code is floating point, but i do have a lot of fixed point experience so if porting down will greatly increase performance i'd be willing to give it a shot. also, is 32 bit fixed point the same performance as 16 bit?
5. what's the best place to read up about the hardware/software architecture of the PSP (memory map, latencies, etc)?
oh as for me i'm working on an audio synthesizer/sequencer ala fruityloops. based on my previous experience with these algorithms, i think i'm 2-3x slower than a fully optimized solution. nearly all the processing power goes into the audio synthesis code - it's a reasonable mixture of math and logical/branching operations. so the more you help me, the more channels of audio and crazy fx you'll have to play with when it's done. :)
thanks everyone!
ethan
PSP assembly coding and compiler optimizations
Re: PSP assembly coding and compiler optimizations
You might want to try -Os instead. It does save on code size (=icache misses), and is still pretty optimised. Try it and see. Also -fsingle-precision-constant, since doubles have no hardware support, and a float literal in C has double type (which promotes the whole expression to double).plankton wrote:1. are there any gcc build switches i really should/shouldn't use? right now i'm using '-O3 -finline-functions-called-once -finline-functions -floop-optimize2'. this gives about a 50% increase in performance over not using any of these switches. most of the gains seem to come form -O3. what am i missing?
Not too many, but you have a lot of options for your app. The most important one is that there's a whole second CPU dedicated to DSP stuff, which seems to have more DSP-oriented instruction extensions. But I don't really know anything about it. Have a look around here for "media engine" discussions.2. are there any examples of PSP programs that mix C and assembly? where would i find doc on the C run time model? any recommended tutorials on R4000 asm? i'd really like to port the linear interpolator to asm but i don't know how to progress down that path.
The wiki has a memory map description, but I don't see any reason why there would be a cliff beyond a certain code/data size, unless your cache misses are going way up.3. i've found that when i cross some unknown barrier in code/data size, performance degrades significantly. i'm guessing that there's some caching issues to work around. does anyone know how to layout code & data in memory for optimal performance? are there fast memories where i should put frequently accessed buffers & modules/if so, what's the best way of getting the linker to place the modules there?
Single precision FP seems pretty quick, and the VFPU extensions can do matrix ops very efficiently as well.4. any general thoughts on fixed vs floating point? most of my code is floating point, but i do have a lot of fixed point experience so if porting down will greatly increase performance i'd be willing to give it a shot. also, is 32 bit fixed point the same performance as 16 bit?
The wiki: http://wiki.ps2dev.org/5. what's the best place to read up about the hardware/software architecture of the PSP (memory map, latencies, etc)?
Cool!oh as for me i'm working on an audio synthesizer/sequencer ala fruityloops. based on my previous experience with these algorithms, i think i'm 2-3x slower than a fully optimized solution. nearly all the processing power goes into the audio synthesis code - it's a reasonable mixture of math and logical/branching operations. so the more you help me, the more channels of audio and crazy fx you'll have to play with when it's done. :)
Re: PSP assembly coding and compiler optimizations
ah good points. i'm used to programming systems which have small pipelines and fast memory access. i'll try both of these out. i imagine that switching to SP floats will be huge. i'm also going to look into some general architectural changes to the code; there's got to be some low hanging fruit in there that i can implement in C.You might want to try -Os instead. It does save on code size (=icache misses), and is still pretty optimised. Try it and see. Also -fsingle-precision-constant, since doubles have no hardware support, and a float literal in C has double type (which promotes the whole expression to double).
oh are there any pragmas in gcc for branch prediction? are they working on the PSP?
hmmm, i think i checked this out in the past and didn't think it was too useful, but i will take another look.Not too many, but you have a lot of options for your app. The most important one is that there's a whole second CPU dedicated to DSP stuff, which seems to have more DSP-oriented instruction extensions. But I don't really know anything about it. Have a look around here for "media engine" discussions.
i'm pretty sure it's just a cache miss/data layout issue. however, it's annoying when everything works fine & you add in another struct and the GUI goes totally unresponsive. :\ i'm used to projects where you specify exactly where you want your code and data segments to lie, which can help a lot in situations like this.The wiki has a memory map description, but I don't see any reason why there would be a cliff beyond a certain code/data size, unless your cache misses are going way up.
and thanks for pointing out the wiki - i read it really early on and i don't think i realized the implications of everything i saw. i do a ton of mallocs; those will be migrating memalign quite soon!
cheers!
Re: PSP assembly coding and compiler optimizations
There's __builtin_expect(). I just tried scattering some around in PSPGL, but I didn't get the results I was hoping for. I was hoping it would put the unlikely branches out of line to make sure the hot-path is in icache, but it didn't seem to do that. I haven't looked into it in detail yet.plankton wrote:oh are there any pragmas in gcc for branch prediction? are they working on the PSP?
If nothing else, its still a whole other CPU for running MIPS instructions.hmmm, i think i checked this out in the past and didn't think it was too useful, but i will take another look.
That sounds very strange. The PSP doesn't seem to have any major precipices like that. I wouldn't be surprised about a 10-20% decline, but it sounds like you're seeing much larger slowdowns. How big is your code/data?i'm pretty sure it's just a cache miss/data layout issue. however, it's annoying when everything works fine & you add in another struct and the GUI goes totally unresponsive. :\ i'm used to projects where you specify exactly where you want your code and data segments to lie, which can help a lot in situations like this.
Re: PSP assembly coding and compiler optimizations
not very big, only a couple hundred kbytes. i first ran into this when i tried to add in a ~25k structure; however i've also now seen it when i've added in small innocuous functions that are even executed. this lack of control over performance is quite disconcerting.That sounds very strange. The PSP doesn't seem to have any major precipices like that. I wouldn't be surprised about a 10-20% decline, but it sounds like you're seeing much larger slowdowns. How big is your code/data?
Re: PSP assembly coding and compiler optimizations
ok i just tried these and it appears i got a small improvement from -Os but nothing from -fsingle-precision-constant. which is surprising because i do a lot of FP calculations in inner loops. ah well. going to look into the aligned memory allocations, see if that helps out more.You might want to try -Os instead. It does save on code size (=icache misses), and is still pretty optimised. Try it and see. Also -fsingle-precision-constant, since doubles have no hardware support, and a float literal in C has double type (which promotes the whole expression to double).
if you have any other ideas, i'm happy to listen! :)