Anyone had problems with flickering triangles?

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
User avatar
turkeyman
Posts: 75
Joined: Wed Oct 20, 2004 7:38 pm
Location: Brisbane, Australia
Contact:

Anyone had problems with flickering triangles?

Post by turkeyman »

Hey again,

Does anyone know if sceGuDrawArray() copies the array to the display list, or does it just reference the array, requiring you to manage, not overwrite, and double buffer the data in dynamic arrays?

I'm having loads of problems with flickering triangles and loads of other crap when i use dynamic vertex arrays..

I am quad buffering (just to be sure).. but it still behaves differently to when i render from a static buffer...

yes my array is aligned... (anticipated question) :P
User avatar
ReKleSS
Posts: 73
Joined: Sat Jun 18, 2005 12:57 pm
Location: Melbourne, Australia

Post by ReKleSS »

Firstly, sceGuDrawArray only references the array. Secondly, are you flushing the cache before drawing? sceKernelDcacheWritebackAll(); should do it.
-ReK
User avatar
turkeyman
Posts: 75
Joined: Wed Oct 20, 2004 7:38 pm
Location: Brisbane, Australia
Contact:

Post by turkeyman »

why should i need to flush the cache?
surely sceGuDrawArray() dosent render it immediately? ..
it just ads it to the display list and it gets kicked off when i sceGuFinish()??
in which case, the dcache should have LOADS of time between rendering, and the end of the frame before for the cache to be flushed naturally... :/

although you might be on to something... the flickering gets less bad the longer i leave it running..

so whats the best way to manage a display list?
i imagine something like this:

program start:
init...

set display list buffer 1
begin frame
render some things (add to the display list)
end frame (display list begins rendering)
set display list buffer 2
begin frame
render some things (adding them to the second display list buffer)
sceGuSync() (allow gpu to finish rendering the previous frame if it hasnt already)
end frame (kick this frames display list)
set display list buffer 1
begin frame
....
etc etc etc

that should allow asyncrenous behaviour of CPU and GPU..

the odd thing is, in the cube demo for example, sceGuSync is called AFTER sceGuFinish() ..
this would not allow any asyncrenous processing by the GPU..
seems strange..

perhaps i've got this all wrong..
someone want to clarify the way to properly manage display list buffers and the rendering pipeline?
User avatar
turkeyman
Posts: 75
Joined: Wed Oct 20, 2004 7:38 pm
Location: Brisbane, Australia
Contact:

Post by turkeyman »

interestingly, flushing the cache seems to have solved the problem..

i'm still not exactly sure why that should have affected it, since the GPU shouldnt access that memory for quite a long time which should give the cache plenty of time to flush....

another question for you all..
what is the PSP's clip space? exactly how does the PSP expect data post projection? and what's the story with the divide by w? it is just performed by the hardware after projection as usual?

also, are there any rules reguarding filling the zbuffer?

i want to change the projection to be z into the screen (LH) and '0' being the nearest zbuffer value... i have tried it, but it dosent seem to work.. any other gotchas i should be aware of?
(i have considered, cull mode, zcmp function, and camera inversion)

sorry for all the questions :P
theres no real hard docs to refer to :)
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Post by crazyc »

turkeyman wrote:that should allow asyncrenous behaviour of CPU and GPU..
http://svn.ps2dev.org/filedetails.php?r ... rev=0&sc=0
ector
Posts: 195
Joined: Thu May 12, 2005 10:22 pm

Post by ector »

Well the way the GPU interface works is that Flush (IIRC, I call it Flush in my library which is not identical to libGu) sets the GPU "Stall" address, i.e. where it should stop reading data. So to get the GPU and CPU running asynchronously, the CPU writing display lists for the GPU to consume, you should call the flush function from time to time to let the GPU know that it can safely start reading the list. I don't know if libGu does this implicitly anywhere, I keep things like that explicit in my library.

It makes perfect sense to call Finish and then Sync because Finish will write an "End of display list" command to the list, and Sync will (presumably) set the stall address and then wait for the GPU to catch up and get to the End of display list command.

I'm not 100% sure if it's the fastest way, but the easiest way I've found to get good data to the Gu without having to writeback the entire cache is to "uncache" my pointers before writing vertex data, that is OR-ing them by 0x40000000. This will disable caching of writes through it and make it write directly to real RAM.

By the way if you want optimal performance with textures, swizzle them (if you are going to draw them at any angle other than 0) and put them in VRAM. The speed difference between linear textures in RAM and swizzled textures in VRAM seem to be orders of magnitude, something like 100x!! Swizzled texture in RAM appear to have acceptable performance, though you really should put your most common textures in VRAM if you have room.

Swizzled textures (all formats) are stored as blocks of 16 bytes X 8 rows. (so 32-bit texture data is swizzled into 4x8 blocks).

By the way, no warranty is offered for the above info, which is the way I understand how things work which I've found out by my own experimentation and may be all wrong :)
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

crazyc wrote:
turkeyman wrote:that should allow asyncrenous behaviour of CPU and GPU..
http://svn.ps2dev.org/filedetails.php?r ... rev=0&sc=0
Do note however that callbacks are not currently enabled within pspgu, as I just get crashes if I add them in sceGuInit(). I'll revisit this & a lot of other stuff when I get back to work on this.
GE Dominator
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

I don't know if libGu does this implicitly anywhere, I keep things like that explicit in my library.
It does. More exactly, sceGuGetMemory(), sceGuCallList(), sceGuDrawArray(), sceGuDrawArrayN(), sceGuFinish() and sceGuSignal() all updates the stall-address.
It makes perfect sense to call Finish and then Sync because Finish will write an "End of display list" command to the list, and Sync will (presumably) set the stall address and then wait for the GPU to catch up and get to the End of display list command.
Actually, sceGuFinish() sets the stall-address, sceGuSync() only waits for the list to finish (default behaviour, haven't explored the rest). One approach I have been pondering would be to have two lists running as some kind of ring-buffer and then listening to interrupts to know when the screen can be flipped (wiring it together with a vbl-interrupt), which could get rid of the sceGuSync() as long as the rendering isn't fast enough. There are more advanced approaches than this, but I think it could be a problem fitting it into the pspgu-approach. :)
I'm not 100% sure if it's the fastest way, but the easiest way I've found to get good data to the Gu without having to writeback the entire cache is to "uncache" my pointers before writing vertex data, that is OR-ing them by 0x40000000. This will disable caching of writes through it and make it write directly to real RAM.
If there are any "uncached accelerated" modes available, I'd pick those (like there is on the PS2, and it kicks ASS when it comes to filling a lot of data in a sequential array), but I'm not getting my hopes up. :)
Swizzled textures (all formats) are stored as blocks of 16 bytes X 8 rows. (so 32-bit texture data is swizzled into 4x8 blocks).
Nice, seems sony has ditched the idea of having one swizzle-mode per pixel-size then... Good move.
GE Dominator
User avatar
turkeyman
Posts: 75
Joined: Wed Oct 20, 2004 7:38 pm
Location: Brisbane, Australia
Contact:

Post by turkeyman »

chp wrote:
I'm not 100% sure if it's the fastest way, but the easiest way I've found to get good data to the Gu without having to writeback the entire cache is to "uncache" my pointers before writing vertex data, that is OR-ing them by 0x40000000. This will disable caching of writes through it and make it write directly to real RAM.
If there are any "uncached accelerated" modes available, I'd pick those (like there is on the PS2, and it kicks ASS when it comes to filling a lot of data in a sequential array), but I'm not getting my hopes up. :)
are you suggesting that 0x40000000 is not a valid cache bypass mode?
i'll use that for my immediate renderer, doing a dcache flush kinda sucks..

is there a way to insert immediate data directly into the main display list rather than having a separate immediate buffer and inserting calls into the main display list to reference it?

whats the story with double buffering display lists?
neither of you mentioned double buffering the display list its self.. are you using some kind of ring buffer technique? how are you stalling the CPU when it catched up to the GPU's display list pointer? or are you just syncing the display list every frame before the CPU starts writing to it?
ector
Posts: 195
Joined: Thu May 12, 2005 10:22 pm

Post by ector »

0x40000000 is sure a valid cache bypass, the question is if it has write combiners so it would be as fast as than writing normally and then flushing...
User avatar
turkeyman
Posts: 75
Joined: Wed Oct 20, 2004 7:38 pm
Location: Brisbane, Australia
Contact:

Post by turkeyman »

0x40000000 is sure a valid cache bypass, the question is if it has write combiners so it would be as fast as than writing normally and then flushing...
Do you know the cached and uncached bus width to memory? i can write combine manually if i need to, provided the uncached write is the same width as the cache...

What does the PS2 do? I'm pretty sure it has an 'uncached accelerated' mode yeah? Does that have write combining? and does it write more than 16 bytes out at a time? (PPro writes 32bytes at a time yeah?)

Even if 0x40000000 dosent have write combining, if i'm writing out 16 bytes at a time, its gotta be faster than trashing and then flushing the dcache every time i write out some vertex/texture data..
Post Reply