sceGuCopyImage and < 16bit images

Raphael · Post by **Raphael** » Sat Mar 04, 2006 10:42 pm

Hello guys, I have a problem and wanted to know if anyone has a quick and fast solution.

What I want to do is copy glyph sprites from a glyph texture (currently in system ram) to a glyph cache buffer in vram, where I build up a string, which I afterwards can draw on screen with only one texture and one sprite primitive.
That's no problem so far, but my glyph texture is a 4bit paletted texture and that's where the problem starts:
My first stupid attempt of using sceGuCopyImage(GU_PSM_T4,...) would create a wide-spread pixel mess in my glyph cache only, so I knew there was some problem with the strides or something simlar. Finally I found out in the gu documentation in SDK, that sceGuCopyImage seems to only be able to handle 16 and 32bit image transfers and that is a real pain now.
I then transformed my strides and source and dest positions so that it would correctly copy the bits like this:

Code: Select all

sceGuCopyImage&#40;GU_PSM_4444,sx/4,sy,sw,sh,256/4,g_font_tex&#91;g_cur_font&#93;,dx/4,dy,512/4,&#40;void*&#41;&#40;g_glyph_cache&#41;&#41;;

But now, my problem is that dx in my code could also possibly be any non multiple of 4, so this attempt would skip some pixels in the destination and cause wrong rendering.
I cannot jump to the correct dest address by adding the dx offset myself, as this would generate a non-16-byte aligned pointer and crash the program.
Well I could create my glyph cache in system ram also and use CPU memcopys for the transfer and only upload the final cache image, but that's a quite dirty workaround.
My next idea now would be to render the glyphs by doing a render to texture for each, but is it even possible to render to a GU_PSM_T4 texture or would I then need to use at least a 16bit texture?

Or better: Is there any way to bypass the >=16bit limitation of sceGuCopyImage and allow it to handle 4 and 8bit copys?

Thanks for any help!

PS: Someone should update the comments in the pspgu.h for sceGuCopyImage that it only works correctly with 16 and 32bit PSM's.

Raphael · Post by **Raphael** » Mon Mar 06, 2006 10:53 pm

I now tried to write my own memcopy function, to copy the glyphs from texture to the cache, however it won't work but rather crash. I use the uncached VRAM pointer to access VRAM and also tried only copying 2 whole ints per line (which make up 16 pixels - the max. width of my glyphs) and I also tried only copying when the x offset in cache is a multiple of 8 (and 32) so the pointer is aligned on 32bit (and 16byte), but even that will just crash. However, if I only fill the VRAM with zeros with my custom function it works, so I don't quite get the problem, as I access 32bit aligned addresses there also, only difference I write constant values.

Code: Select all

void font_clear_cache&#40; &#41;
&#123;
	unsigned int* d = &#40;unsigned int*&#41;&#40;&#40;unsigned int&#41;g_glyph_cache | 0x40000000&#41;;
	int i, j;
	for &#40;i=0;i<g_cached_lines;i++&#41; &#123;
	  for &#40;j=0;j<&#40;g_cached_width+7&#41; >> 3;j++&#41;
		*d++ = &#40;unsigned int&#41;0x0;
	  d+=&#40;512 >> 3&#41; - &#40;&#40;g_cached_width+7&#41; >> 3&#41;;
	&#125;
&#125;

void font_copy_glyph&#40; int sx, int sy, int sh, char *s, int dx, int dy, char *d &#41;
&#123;
	//int skip_first_halfbyte = &#40;dx%2==1&#41;;
	char* src = &#40;char*&#41;&#40;s&#91;&#40;&#40;sx+&#40;sy<<8&#41;&#41; >> 1&#41;&#93;&#41;;
	char* dst = &#40;char*&#41;&#40;d&#91;&#40;&#40;dx+&#40;dy<<9&#41;&#41; >> 1&#41;&#93;&#41;;
	int i;
	if &#40;dx%8==0&#41; &#123;
		// can do fast copy
		unsigned int* u32s = &#40;unsigned int*&#41;src;
		unsigned int* u32d = &#40;unsigned int*&#41;dst;
		for &#40;i=0;i<sh;i++&#41; &#123;
		  // 32bit copy part
		  *u32d++ = *u32s++;
		  *u32d++ = *u32s++;
		  u32s += &#40;128-8&#41;>>2;
		  u32d += &#40;256-8&#41;>>2;
		&#125;
	&#125;
&#125;

Note: The texture is 256 pixels and the cache 512 pixels wide, both at 4bit per pixel. Thus the offsets and strides must be correct.

Brunni · Post by **Brunni** » Tue Mar 07, 2006 9:44 pm

Like you said, I think creating a 4-bit texture is a good idea.
I also use a 512-byte buffer (9:7, y is multiple of text height) to know the position of each glyph on the texture for a variable-width font...

Raphael · Post by **Raphael** » Wed Mar 08, 2006 1:12 am

Well I finally found the bug, it seems the array-like offset positioning I was using was wrong (could someone explain me why?), but this attempt works now perfectly:

Code: Select all

/*
  sw is always 16 here
*/
void font_copy_glyph&#40; int sx, int sy, int sh, char *s, int dx, int dy, char *d &#41;
&#123;
	int i;
	if &#40;&#40;dx&0x7&#41;==0&#41; &#123;
		unsigned int* u32s = &#40;unsigned int*&#41;&#40;&#40;unsigned int&#41;s+&#40;&#40;sx+&#40;sy<<8&#41;&#41; >> 1&#41;&#41;;
		unsigned int* u32d = &#40;unsigned int*&#41;&#40;&#40;unsigned int&#41;d+&#40;&#40;dx+&#40;dy<<9&#41;&#41; >> 1&#41;&#41;;

		// can do fast copy
		for &#40;i=0;i<sh;i++&#41; &#123;
		  // 32bit copy part
		  *u32d++ = *u32s++;
		  *u32d++ = *u32s++;
		  u32s += &#40;128-8&#41;>>2;
		  u32d += &#40;256-8&#41;>>2;
		&#125;
	&#125; else &#123;

	unsigned int* u32s = &#40;unsigned int*&#41;&#40;&#40;unsigned int&#41;s+&#40;&#40;sx+&#40;sy<<8&#41;&#41; >> 1&#41;&#41;;
	unsigned int* u32d = &#40;unsigned int*&#41;&#40;&#40;unsigned int&#41;d+&#40;&#40;&#40;dx>>3&#41;<<2&#41;+&#40;dy<<8&#41;&#41;&#41;;

	unsigned int mask = 0;
	unsigned int shift = &#40;dx&0x7&#41;<<2;
	for &#40;i=0;i<&#40;dx&0x7&#41;;i++&#41; &#123;
		mask <<= 4;	
		mask |= 0xf;
	&#125;

	for &#40;i=0;i<sh;i++&#41; &#123;
	  unsigned int s1 = *u32s++;
	  unsigned int s2 = *u32s++;
	  
	  // copy first halfbytes
	  *u32d++ = &#40;*u32d & mask&#41;|&#40;s1 << shift&#41;;
	  
	  *u32d++ = &#40;s1 >> &#40;32-shift&#41;&#41; | &#40;s2 << shift&#41;;
	  
	  // copy last halfbytes
	  *u32d++ = &#40;*u32d & ~mask&#41;|&#40;s2 >> &#40;32-shift&#41;&#41;;
	  
	  u32s += &#40;128-8&#41;>>2;
	  u32d += &#40;256-12&#41;>>2;
	&#125;

	&#125;
&#125;

This code perfectly copies 16 halfbytes wide sprite from s to d, even when d is a VRAM pointer. Haven't tried if it works with VRAM->VRAM copies too. It's only one int copy overhead if dx isn't a multiple of 8 (plus the 2 mem reads) and quite good for now. So after all it IS possible to access 32bit aligned mem locations in VRAM as I thought by using the CPU.
However, I'm still unhappy that the GU cannot handle 4 or 8bit Memcopies (at least the latter should be possible, since 8bit paletted textures are very frequently used).

Hope this helps someone other to not get stuck for over a week with such a problem