GU/GE/GUM Measuring perfs & GU good programming

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
User avatar
Shazz
Posts: 244
Joined: Tue Aug 31, 2004 11:42 pm
Location: Somewhere over the rainbow
Contact:

GU/GE/GUM Measuring perfs & GU good programming

Post by Shazz »

Hello,

I'd like to know what is the easiest way to measure graphical routines CPU(/GPU ?) usage (not using PSPGL, only PSPSDK/GU/GUM libs).
If it's still possible to measure, like in the old days, more or less tme used by each routine by changing the bgcolor, stuff like that...

Thanks !

(And if you've got some tips on "efficient programming for the PSP GU" I'm interested too :D My code seems to definitively knocks down the GPU)
Last edited by Shazz on Mon Nov 28, 2005 7:29 pm, edited 1 time in total.
- TiTAN Art Division -
http://www.titandemo.org
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

What you could do is that when rendering has completed, switch rendertarget to the frontbuffer and render a small quad in the corner of the bitmap in a different color. This quad will start showing up on the line when rendering of the backbuffer has completed.

sceGuDrawBufferList() to switch rendering to the frontbuffer is preferred, since it does not store any internal states and sceGuSwapBuffers() will work properly afterwards.
GE Dominator
User avatar
Shazz
Posts: 244
Joined: Tue Aug 31, 2004 11:42 pm
Location: Somewhere over the rainbow
Contact:

Post by Shazz »

Humm that's an idea... I was thinking more of measuring the time of my different routines than the time that the GE/GU takes to handle the display lists... but that's a very fair statement... thx !

So while speaking of this kind of stuff, maybe you'll have some answers to a list of dumb questions :
1. What's better, calling many times sceGumDrawArray with a small list of one time with a bigger list ? Is there an optimal size ? (dma transfers ?)

2. When allocating space for a vertex list, what's better, reserving a hue table as done inthe example of each time asking the Gu to reserve space in the Gu memory using sceGuGetMemory ? (and better to call sceGuGetMemory each time or one time in the main for example, how do you free it ?)

3. What's the optimal psp VRAM memory mode/bpp for performance ? 16bits (5551, 565) ?

4. While using the GU_SPRITES primitive, are only UV coordinates usable (0-textureSize) and not STQ (0-1.0f) ?

5. Swizzling texture... as I understood, that's interesting only for 32bits textures, so not for 5551 for example ? (not clear for me...) What's the texture page size ?

6.What do really the vertex declaration
GU_TRANSFORM_3D
GU_TRANSFORM_2D
GU_TRANSFORM_BITS
do & imply ?

7. When is it better to flush the cache in the rendering process ? I usually flush it before calling the various sceGumDrawArray.

8. Very dumb question... in my code, I sned the vertices coords and the normalized, normals coords, st coords using
GU_TRIANGLES,GU_NORMAL_32BITF|GU_TEXTURE_32BITF|GU_COLOR_8888|GU_VERTEX_32BITF|GU_TRANSFORM_3D
and it seems that my normals coords are not transformed (rotated+translated) as the vertices are... is there something to do explicitely ?

Ouf :D that's all for now :D
- TiTAN Art Division -
http://www.titandemo.org
starman2049
Posts: 75
Joined: Mon Sep 19, 2005 5:41 am

Post by starman2049 »

To answer some of your questions (from my experience):

1) My calls to DrawArray() are linked with texture loads since there isn;t much mem on the PSP I need to upload textures on the fly and it seems to be a nice size (works kind of same on PS2)

2) I have display list memory in local memory not GU mem. I didn;t have enough gu mem to support #1 above

3) I use 32bit frame buffers and 16 bit zbuffer

4) I noticed the same thing, but I think you can set a texture scale factor if you want too.

5) I don't use 16 bit so I don't know for sure, but since it works for CLUT's I would not be surprised if it worked for 16 bit too - it's just data to the swizzler. The texture cache is 8k off the top of my head.

6) i use GU_TRANSFORM_3D for terrain, etc, and GU_TRANSFORM_2D for HUD, etc. Don't know about GU_TRANSFORM_BITS.

7) You seem to have to flush the cache just before any DrawArray calls to make sure data is written to memory.

8) don't know about this one...
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

Shazz wrote: 2. When allocating space for a vertex list, what's better, reserving a hue table as done inthe example of each time asking the Gu to reserve space in the Gu memory using sceGuGetMemory ? (and better to call sceGuGetMemory each time or one time in the main for example, how do you free it ?)
You do not free memory allocated with sceGuGetMemory(), the memory is only valid for as long as the current display list is executed.
Shazz wrote: 4. While using the GU_SPRITES primitive, are only UV coordinates usable (0-textureSize) and not STQ (0-1.0f) ?
No, it depends on if you let the transform pipe run on the sprites or not.
Shazz wrote: 5. Swizzling texture... as I understood, that's interesting only for 32bits textures, so not for 5551 for example ? (not clear for me...) What's the texture page size ?
Swizzling is a must for any bitdepth, but swizzling does not in itself care about the depth of the texture, it just grabs chunks of 16x8 bytes when it's refilling the page.

Texture page size seems to be 8kB, as on PS2.
Shazz wrote: 6.What do really the vertex declaration
GU_TRANSFORM_3D
GU_TRANSFORM_2D
GU_TRANSFORM_BITS
do & imply ?
GU_TRANSFORM_3D = Runs the vertices through the T&L pipe before they go to the rasterizer.
GU_TRANSFORM_2D = Pass the vertices directly to the rasterizer.
GU_TRANSFORM_BITS is just a bitmask to filter out the GU_TRANSFORM_* bits for simplicity.
Shazz wrote: 7. When is it better to flush the cache in the rendering process ? I usually flush it before calling the various sceGumDrawArray.
The best is to never flush it at all. Just make sure you write (and don't read!) to uncached memory when you do anything temporary. sceGuGetMemory() returns uncached memory, so if you use that one you're safe.
GE Dominator
User avatar
Shazz
Posts: 244
Joined: Tue Aug 31, 2004 11:42 pm
Location: Somewhere over the rainbow
Contact:

Post by Shazz »

Thanks both for the answers... it really helps.

Maybe can I add some new questions ?

1. Swizzling textures :

So whatever the format, that's better, but, according to the time needed to swizzle a texture, should we swizzle only static textures (images...) or realtime generated textures too (render target,...)
And the code snippets (http://wiki.ps2dev.org/psp:ge_faq) only works for 32b textures non (as it uses u32) no ?

2. Display lists

I still don't see is it's better to send on big list with sceGumDrawArray or many little. And is sceGumDrawArray N can improve perfs ?

3. Temporary memory allocation on the VRAM

what is the upper limit of sceGuGetMemory() ? I've got issues when I asked for a big vertex list (but smaller than the free VRAM)

That's all :D
- TiTAN Art Division -
http://www.titandemo.org
User avatar
Shazz
Posts: 244
Joined: Tue Aug 31, 2004 11:42 pm
Location: Somewhere over the rainbow
Contact:

Post by Shazz »

Obviously, rather than only asking questions, I write some tests...

And despite many some wrong conclusions, here are some stuff I found and many some talks would confirm or not...

1. It seems better to call sceGuGetMemory() one time with a bigger vertex list than in a loop (for smaller vertex list and multiples sceGumDrawArray())
It seems that the overhead implied by reserving a secure VRAM memory space is not little.

2. While coding some common blur effects based of offscreen buffer textured multiple time on the frontbuffer, it seems that the PSP GPU cannot really display more than 7-8 fullscreen textures per frame.
I tried using a 16b 64x64 texture and after 7 frames I need 2 VBLs...
I used the SPRITE primitive in one shoot (480x272, 2 vertices) and splitting the screen in 64x64 sprites... Same result (and one sprite seems faster than multiple small ones)

Obviously, as I'm speaking of realtime effect, I did not swizzle the texture (as it changes every frame and as this texture in located in the VRAM maybe it is already swizzled ?)
I did not test using triangles primitive as I have issues to use triangles without 3D transformations (with screen coords, not space)

So if I'm wrong.... tell me :D
- TiTAN Art Division -
http://www.titandemo.org
urchin
Posts: 121
Joined: Thu Jun 02, 2005 5:41 pm

Post by urchin »

Using many 64x272 sprites seems faster to me than a single 480x272. I haven't timed it, but I get flickering when I try to overlay a bitmap with transparency on top of the single sprite blit. All is OK when I use the multiple sprite approach. I'm using 32 bit mode and everything is unswizzled btw.
User avatar
Shazz
Posts: 244
Joined: Tue Aug 31, 2004 11:42 pm
Location: Somewhere over the rainbow
Contact:

Post by Shazz »

oh that's interesting... I did not try using a "column" approach... I'll try too !
- TiTAN Art Division -
http://www.titandemo.org
Post Reply