sceGuTexImage is slow!

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
Brunni
Posts: 186
Joined: Sat Oct 08, 2005 10:27 pm

sceGuTexImage is slow!

Post by Brunni »

Hello.

I'm doing some tests for 2D sprite drawing. But the sceGuTexImage call (before drawing vertices) is very slow because it copies the texture to another place in VRAM (even if my image is already in VRAM...).

I would not need to copy my sprites at each frame, but as I can't choose where the texture is copied, I must do it for each sprite, except if I draw the same sprite multiple times... (which is never the case in reality, except in benchmarks).

With this call, I can only draw about 13'000 64x64 sprites by second (~100 MB/sec). Without this call, I can draw about 45'000 of those sprites by second (~350 MB/sec), but it's always the same image... (but even 45000 spr/sec is still slow). If my image is in RAM, it's even slower.
Here is the code. Do you know what's so slow? Or I should use another method?

Code: Select all

unsigned short *vmemptr = (void*)(0x04100000);

typedef struct		{
	int larg, haut;
	unsigned short *sprite;
} IMAGE;

#define IMAGE_SIZE(img)		((img)->larg*(img)->haut*2)

IMAGE CreateNewImage(int larg, int haut, unsigned short *content)		{
	IMAGE img;
	img.larg = larg;
	img.haut = haut;
	img.sprite = vmemptr;
	if (content != NULL)
		memcpy(img.sprite, content, IMAGE_SIZE(&img));
	vmemptr += IMAGE_SIZE(&img);
	return img;
}

void DrawImage(IMAGE *img, int x, int y)				{
		struct Vertex* vertices;

		sceGuTexImage(0,img->larg,img->haut,img->larg,img->sprite);
		sceGuTexScale(1.0f/512.0f,1.0f/512.0f); // scale UVs to 0..1
		sceGuTexOffset(0.0f, 0.0f);

		vertices = (struct Vertex*)sceGuGetMemory(2 * sizeof(struct Vertex));

		vertices[0].u = 0;
		vertices[0].v = 0;
		vertices[0].color = 0;
		vertices[0].x = x;
		vertices[0].y = y;
		vertices[0].z = 0;
		vertices[1].u = img->larg;
		vertices[1].v = img->haut;
		vertices[1].color = 0;
		vertices[1].x = x + img->larg;
		vertices[1].y = y + img->haut;
		vertices[1].z = 0;

		sceGuDrawArray(GU_SPRITES,GU_TEXTURE_16BIT|GU_COLOR_4444|GU_VERTEX_16BIT|GU_TRANSFORM_2D,2,0,vertices);
}

void StartDrawing()		{
	sceGuStart(GU_DIRECT,list);
	sceGuTexMode(GU_PSM_4444,0,0,0);
	sceGuTexFunc(GU_TFX_REPLACE,GU_TCC_RGB);
	sceGuTexFilter(GU_NEAREST,GU_NEAREST);
	sceGuAmbientColor(0xffffffff);
}

void EndDrawing()		{
	sceGuFinish();
	sceGuSync(0,0);
}

void main()
{
	unsigned int x,y;
	IMAGE myImage;

	myImage = CreateNewImage(64, 64, NULL);
	for &#40;y=0;y<64;y++&#41;
		for &#40;x=0;x<64;x++&#41;
			myImage.sprite&#91;y*myImage.larg+x&#93;=x*y;
	sceKernelDcacheWritebackAll&#40;&#41;;
	while&#40;1&#41;
	&#123;
		StartDrawing&#40;&#41;;
		for &#40;x=0;x<1000;x++&#41;
			DrawImage&#40;&myImage, val%200, 0&#41;;
		EndDrawing&#40;&#41;;
		sceGuSwapBuffers&#40;&#41;;
	&#125;
&#125;
Thank you in advance
Last edited by Brunni on Sun Nov 27, 2005 11:25 pm, edited 2 times in total.
Sorry for my bad english
Image Oldschool library for PSP - PC version released
Shine
Posts: 728
Joined: Fri Dec 03, 2004 12:10 pm
Location: Germany

Re: sceGuTexImage is slow!

Post by Shine »

I don't think that sceGuTexImage copies the texture to another place, but calling this function might clear the texture cache. It should be faster if you call this function only, when the texture was changed.
Brunni
Posts: 186
Joined: Sat Oct 08, 2005 10:27 pm

Post by Brunni »

Thanks. In fact, I figured out I rather have to copy one time the texture (containing all different sprites) and blit only parts of this texture by setting vertices' u and v coordinates.
But anyways it's unbelievably slow (about 50'000 64x64 non scaled sprites per second).
I'm a total beginner, but where are the 33 million polygons per second the PSP could theoretically handle? This should make 16 million sprites/sec rather than 50'000... no?
Sorry for my bad english
Image Oldschool library for PSP - PC version released
Shine
Posts: 728
Joined: Fri Dec 03, 2004 12:10 pm
Location: Germany

Post by Shine »

Brunni wrote:I'm a total beginner, but where are the 33 million polygons per second the PSP could theoretically handle? This should make 16 million sprites/sec rather than 50'000... no?
http://en.wikipedia.org/wiki/PlayStation_Portable says:
Specifications state that the PSP is capable of rendering 33 million flat-shaded polygons per second, with a 664 million pixel per second fill rate
This means about 33 million polygons, each with a size e.g. of 4x5 pixels (20 pixels * 33 million polygons = 664 million pixels per second), with not textures.
Brunni
Posts: 186
Joined: Sat Oct 08, 2005 10:27 pm

Post by Brunni »

Okay, thanks.
But anyways, by calculating:
Fillrate is: 664 Mpixels/sec -> my texture is 64x64 = 4096 pixels, 8 kB -> 162 thousand per second (or 81 thousand if a pixel is considered 8 bits).
VRAM is: 111 MHz * 512 bits = 6.6 (documented 5.3 GB/s) -> 640 thousand per second
Total: 129'630 sprites/sec, or 72'030 sprites/sec if 664 Mpixels means 8 bits pixels.
And I can just draw 45'000... It's quite far from what I could expect... So there is a problem in my code.
I can't upload it, so I post it here (you can compile it directly, no additionnal file needed). Is there a faster mean to do this? The guTexImage is only called once now.

[Edit]
Now I am at 66'000 sprites per second, which is correct assuming that the 664 Mpixels/sec means 8-bit pixels (and is divided by two in 16-bit and by four in 32-bits). Can anyone confirm this?


Thank you in advance ^^

Code: Select all

#include <pspkernel.h>
#include <pspdisplay.h>
#include <pspdebug.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <time.h>
#include <pspctrl.h>

#include <pspgu.h>

PSP_MODULE_INFO&#40;"Blit Sample", 0, 1, 1&#41;;
PSP_MAIN_THREAD_ATTR&#40;THREAD_ATTR_USER&#41;;

static unsigned int __attribute__&#40;&#40;aligned&#40;16&#41;&#41;&#41; list&#91;262144&#93;;

int done = 0;

/* Exit callback */
int exit_callback&#40;int arg1, int arg2, void *common&#41;
&#123;
	done = 1;
	return 0;
&#125;

/* Callback thread */
int CallbackThread&#40;SceSize args, void *argp&#41;
&#123;
	int cbid;

	cbid = sceKernelCreateCallback&#40;"Exit Callback", exit_callback, NULL&#41;;
	sceKernelRegisterExitCallback&#40;cbid&#41;;

	sceKernelSleepThreadCB&#40;&#41;;

	return 0;
&#125;

/* Sets up the callback thread and returns its thread id */
int SetupCallbacks&#40;void&#41;
&#123;
	int thid = 0;

	thid = sceKernelCreateThread&#40;"update_thread", CallbackThread, 0x11, 0xFA0, 0, 0&#41;;
	if&#40;thid >= 0&#41;
	&#123;
		sceKernelStartThread&#40;thid, 0, 0&#41;;
	&#125;

	return thid;
&#125;

struct Vertex
&#123;
	unsigned short u, v;
	unsigned short color;
	short x, y, z;
&#125;;

/*
	SPECIAL
*/

unsigned short *vmemptr = &#40;void*&#41;&#40;0x04100000&#41;;

typedef struct		&#123;
	int sizeX, sizeY;
	float zoom;
	int offsetX, offsetY;
	unsigned short *sprite;
&#125; IMAGE;

#define IMAGE_SIZE&#40;img&#41;		&#40;&#40;img&#41;->sizeX*&#40;img&#41;->sizeY*2&#41;

IMAGE CreateNewImage&#40;int larg, int haut, unsigned short *content&#41;		&#123;
	IMAGE img;
	memset&#40;&img, 0, sizeof&#40;img&#41;&#41;;
	img.sizeX = larg;
	img.sizeY = haut;
	img.sprite = vmemptr;
	img.zoom = 1;
	if &#40;content != NULL&#41;
		memcpy&#40;img.sprite, content, IMAGE_SIZE&#40;&img&#41;&#41;;
	vmemptr += IMAGE_SIZE&#40;&img&#41;;
	return img;
&#125;

void SimpleDrawImage&#40;IMAGE *img, int x, int y&#41;				&#123;
		struct Vertex* vertices;

		// setup the source buffer as a texture
//		sceGuTexImage&#40;0,img->sizeX,img->sizeY,img->sizeX,img->sprite&#41;;
		sceGuTexScale&#40;1.0f/512.0f,1.0f/512.0f&#41;;		// scale UVs to 0..1
		sceGuTexOffset&#40;0.0f, 0.0f&#41;;

		vertices = &#40;struct Vertex*&#41;sceGuGetMemory&#40;2 * sizeof&#40;struct Vertex&#41;&#41;;

		vertices&#91;0&#93;.u = 0;
		vertices&#91;0&#93;.v = 0;
		vertices&#91;0&#93;.color = 0;
		vertices&#91;0&#93;.x = x;
		vertices&#91;0&#93;.y = y;
		vertices&#91;0&#93;.z = 0;
		vertices&#91;1&#93;.u = img->sizeX;
		vertices&#91;1&#93;.v = img->sizeY;
		vertices&#91;1&#93;.color = 0;
		vertices&#91;1&#93;.x = x + img->sizeX;
		vertices&#91;1&#93;.y = y + img->sizeY;
		vertices&#91;1&#93;.z = 0;

/*		int i;
		for &#40;i=0;i<1000;i++&#41;*/
			sceGuDrawArray&#40;GU_SPRITES,GU_TEXTURE_16BIT|GU_COLOR_4444|GU_VERTEX_16BIT|GU_TRANSFORM_2D,2,0,vertices&#41;;
&#125;

void SetTexture&#40;IMAGE *img&#41;		&#123;
	sceGuTexImage&#40;0, img->sizeX, img->sizeY, img->sizeX, img->sprite&#41;;
&#125;

void StartDrawing&#40;&#41;		&#123;
	sceGuStart&#40;GU_DIRECT,list&#41;;
	sceGuTexMode&#40;GU_PSM_4444,0,0,0&#41;;
	sceGuTexFunc&#40;GU_TFX_REPLACE,GU_TCC_RGB&#41;;
	sceGuTexFilter&#40;GU_NEAREST,GU_NEAREST&#41;;
	sceGuAmbientColor&#40;0xffffffff&#41;;
&#125;

void EndDrawing&#40;&#41;		&#123;
	sceGuFinish&#40;&#41;;
	sceGuSync&#40;0,0&#41;;
&#125;

/*
	END SPECIAL
*/


int main&#40;int argc, char* argv&#91;&#93;&#41;
&#123;
	unsigned int x,y;
	SceCtrlData ctl;
	IMAGE myImage;

	pspDebugScreenInit&#40;&#41;;
	SetupCallbacks&#40;&#41;;

	sceGuInit&#40;&#41;;

	// setup
	sceGuStart&#40;GU_DIRECT,list&#41;;
	sceGuDrawBuffer&#40;GU_PSM_4444,&#40;void*&#41;0,512&#41;;
	sceGuDispBuffer&#40;480,272,&#40;void*&#41;0x88000,512&#41;;
	sceGuDepthBuffer&#40;&#40;void*&#41;0x110000,512&#41;;
	sceGuOffset&#40;2048 - &#40;480/2&#41;,2048 - &#40;272/2&#41;&#41;;
	sceGuViewport&#40;2048,2048,480,272&#41;;
	sceGuDepthRange&#40;0xc350,0x2710&#41;;
	sceGuScissor&#40;0,0,480,272&#41;;
	sceGuEnable&#40;GU_SCISSOR_TEST&#41;;
	sceGuFrontFace&#40;GU_CW&#41;;
	sceGuEnable&#40;GU_TEXTURE_2D&#41;;
	sceGuClear&#40;GU_COLOR_BUFFER_BIT|GU_DEPTH_BUFFER_BIT&#41;;
	sceGuFinish&#40;&#41;;
	sceGuSync&#40;0,0&#41;;

	sceDisplayWaitVblankStart&#40;&#41;;
	sceGuDisplay&#40;1&#41;;

	int val = 0;

	// generate dummy image to blit
	myImage = CreateNewImage&#40;64, 64, NULL&#41;;
	for &#40;y=0;y<64;y++&#41;
		for &#40;x=0;x<64;x++&#41;
			myImage.sprite&#91;y*myImage.sizeX+x&#93;=x*y;
	sceKernelDcacheWritebackAll&#40;&#41;;

//	memcpy&#40;&#40;void*&#41;&#40;0x04100000&#41;, pixels, sizeof&#40;pixels&#41;&#41;;
//	sceGuCopyImage&#40;GU_PSM_4444,0,0,480,272,512,pixels,0,0,512,&#40;void*&#41;&#40;0x04100000&#41;&#41;;

	float curr_ms = 1.0f;
	struct timeval time_slices&#91;16&#93;;

	while &#40;!done&#41;
	&#123;
		StartDrawing&#40;&#41;;
		SetTexture&#40;&myImage&#41;;
//		myImage.zoom = 1.0f/&#40;&#40;float&#41;&#40;val%200+1&#41;/50.f&#41;;
		for &#40;x=0;x<1000;x++&#41;
			SimpleDrawImage&#40;&myImage, val%200, 0&#41;;
		EndDrawing&#40;&#41;;
//		sceDisplayWaitVblankStart&#40;&#41;;
		sceGuSwapBuffers&#40;&#41;;
		val++;
	&#125;

	sceGuTerm&#40;&#41;;

	sceKernelExitGame&#40;&#41;;
	return 0;
&#125;
Sorry for my bad english
Image Oldschool library for PSP - PC version released
ector
Posts: 195
Joined: Thu May 12, 2005 10:22 pm

Post by ector »

The sceGuDrawArray has some overhead, it might be faster if you fill larger arrays of coordinates and draw multiple sprites per call.
http://www.dtek.chalmers.se/~tronic/PSPTexTool.zip Free texture converter for PSP with source. More to come.
memon
Posts: 63
Joined: Mon Oct 03, 2005 10:51 pm

Post by memon »

I think you want to swizzle your textures for better perfonmance. Also I have heard that drawing sprites of width 32pixels is the optimal solution to draw 2D stuff (it would be that it was only for non-swizzled textures, can't remember the details). Does someone have more info on that?
weak
Posts: 114
Joined: Thu Jan 13, 2005 8:31 pm
Location: Vienna, Austria

Post by weak »

64px performs as good as 32, so that seems to be the optimal width for a stripped blit.

texture swizzeling is a problem with 2d (game) stuff. most likely you'll need the accurate pixel information for collision detection...
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

It depends on what source-psm you use for the texture how wide sprites you can blit without ruining cache performance:

32x32 in 32-bit mode
64x32 in 16-bit mode
128x64 in 8-bit mode
128x128 in 4-bit mode

And why not store collision masks as 1-bit patterns instead of using alpha or similar? Not that much more memory, and checking large areas for early rejection should be rather quick.
GE Dominator
weak
Posts: 114
Joined: Thu Jan 13, 2005 8:31 pm
Location: Vienna, Austria

Post by weak »

didn't think about the psm. good point.

and well, 1bit patterns would of course work too, just extra work. but if you need the performance that's actually a good idea
memon
Posts: 63
Joined: Mon Oct 03, 2005 10:51 pm

Post by memon »

Chp, how about if I want to do a fullscreen blit, like doing some multipass blur or similar. In that case the source is linear texture. It is still better to draw vertical strips? Say, I have 512x272 offscreen surface and I want to blit that to frame buffer. If I have downsamepled that to 256x136? (apparently it is slightly different because it depends on the texture cache)
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

It's always best to blit in stripes aligned to the cache-boundary. If you have to refill the cache for every line, you're going to have major issues.
GE Dominator
Brunni
Posts: 186
Joined: Sat Oct 08, 2005 10:27 pm

Post by Brunni »

Thanks a lot for you replies.
I don't know what texture swizzling is, but I'll take a look at it.
Now I've another question about performances, it seems to be possible to set sceGuTexImage and sceGuDrawBuffer to non power of two / block aligned buffer width (e.g sceGuDrawBuffer(GU_PSM_4444, address, 250) and sceGuTexImage(0, 256, 256, 250, address)). It works fine, but why is it said it must be block aligned in the documentation?
Setting non power of 2 values improves memory usage a lot, but maybe cripples performances? Has anyone tested?
Anyways, I'm sorry I have no precise benchmarking tool at the moment, but I didn't see a performance hurt by sight:
- Run test: draw a 64x64 image to the buffer (in VRAM) then blit it on the framebuffer. Virtual buffer size is 250x224, physical is one time 256x256, the other 250x224 (althrough only 250x224 is blitted in each case), 16 bits, 20 times loop:
- Run time with 250 pixels aligned: 15.6 sec
- Run time with 256 pixels (block) aligned: 16.5 sec
It seems even to be faster when non aligned (cache...).
Can anyone confirm this?
Sorry for my bad english
Image Oldschool library for PSP - PC version released
ector
Posts: 195
Joined: Thu May 12, 2005 10:22 pm

Post by ector »

The docs may be wrong :)
http://www.dtek.chalmers.se/~tronic/PSPTexTool.zip Free texture converter for PSP with source. More to come.
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

I did have issues when I first used that parameter, but it might have been something else at the time. I might take a look at that later this week.
GE Dominator
Post Reply