Anyone had problems with flickering triangles?
Anyone had problems with flickering triangles?
Hey again,
Does anyone know if sceGuDrawArray() copies the array to the display list, or does it just reference the array, requiring you to manage, not overwrite, and double buffer the data in dynamic arrays?
I'm having loads of problems with flickering triangles and loads of other crap when i use dynamic vertex arrays..
I am quad buffering (just to be sure).. but it still behaves differently to when i render from a static buffer...
yes my array is aligned... (anticipated question) :P
Does anyone know if sceGuDrawArray() copies the array to the display list, or does it just reference the array, requiring you to manage, not overwrite, and double buffer the data in dynamic arrays?
I'm having loads of problems with flickering triangles and loads of other crap when i use dynamic vertex arrays..
I am quad buffering (just to be sure).. but it still behaves differently to when i render from a static buffer...
yes my array is aligned... (anticipated question) :P
why should i need to flush the cache?
surely sceGuDrawArray() dosent render it immediately? ..
it just ads it to the display list and it gets kicked off when i sceGuFinish()??
in which case, the dcache should have LOADS of time between rendering, and the end of the frame before for the cache to be flushed naturally... :/
although you might be on to something... the flickering gets less bad the longer i leave it running..
so whats the best way to manage a display list?
i imagine something like this:
program start:
init...
set display list buffer 1
begin frame
render some things (add to the display list)
end frame (display list begins rendering)
set display list buffer 2
begin frame
render some things (adding them to the second display list buffer)
sceGuSync() (allow gpu to finish rendering the previous frame if it hasnt already)
end frame (kick this frames display list)
set display list buffer 1
begin frame
....
etc etc etc
that should allow asyncrenous behaviour of CPU and GPU..
the odd thing is, in the cube demo for example, sceGuSync is called AFTER sceGuFinish() ..
this would not allow any asyncrenous processing by the GPU..
seems strange..
perhaps i've got this all wrong..
someone want to clarify the way to properly manage display list buffers and the rendering pipeline?
surely sceGuDrawArray() dosent render it immediately? ..
it just ads it to the display list and it gets kicked off when i sceGuFinish()??
in which case, the dcache should have LOADS of time between rendering, and the end of the frame before for the cache to be flushed naturally... :/
although you might be on to something... the flickering gets less bad the longer i leave it running..
so whats the best way to manage a display list?
i imagine something like this:
program start:
init...
set display list buffer 1
begin frame
render some things (add to the display list)
end frame (display list begins rendering)
set display list buffer 2
begin frame
render some things (adding them to the second display list buffer)
sceGuSync() (allow gpu to finish rendering the previous frame if it hasnt already)
end frame (kick this frames display list)
set display list buffer 1
begin frame
....
etc etc etc
that should allow asyncrenous behaviour of CPU and GPU..
the odd thing is, in the cube demo for example, sceGuSync is called AFTER sceGuFinish() ..
this would not allow any asyncrenous processing by the GPU..
seems strange..
perhaps i've got this all wrong..
someone want to clarify the way to properly manage display list buffers and the rendering pipeline?
interestingly, flushing the cache seems to have solved the problem..
i'm still not exactly sure why that should have affected it, since the GPU shouldnt access that memory for quite a long time which should give the cache plenty of time to flush....
another question for you all..
what is the PSP's clip space? exactly how does the PSP expect data post projection? and what's the story with the divide by w? it is just performed by the hardware after projection as usual?
also, are there any rules reguarding filling the zbuffer?
i want to change the projection to be z into the screen (LH) and '0' being the nearest zbuffer value... i have tried it, but it dosent seem to work.. any other gotchas i should be aware of?
(i have considered, cull mode, zcmp function, and camera inversion)
sorry for all the questions :P
theres no real hard docs to refer to :)
i'm still not exactly sure why that should have affected it, since the GPU shouldnt access that memory for quite a long time which should give the cache plenty of time to flush....
another question for you all..
what is the PSP's clip space? exactly how does the PSP expect data post projection? and what's the story with the divide by w? it is just performed by the hardware after projection as usual?
also, are there any rules reguarding filling the zbuffer?
i want to change the projection to be z into the screen (LH) and '0' being the nearest zbuffer value... i have tried it, but it dosent seem to work.. any other gotchas i should be aware of?
(i have considered, cull mode, zcmp function, and camera inversion)
sorry for all the questions :P
theres no real hard docs to refer to :)
http://svn.ps2dev.org/filedetails.php?r ... rev=0&sc=0turkeyman wrote:that should allow asyncrenous behaviour of CPU and GPU..
Well the way the GPU interface works is that Flush (IIRC, I call it Flush in my library which is not identical to libGu) sets the GPU "Stall" address, i.e. where it should stop reading data. So to get the GPU and CPU running asynchronously, the CPU writing display lists for the GPU to consume, you should call the flush function from time to time to let the GPU know that it can safely start reading the list. I don't know if libGu does this implicitly anywhere, I keep things like that explicit in my library.
It makes perfect sense to call Finish and then Sync because Finish will write an "End of display list" command to the list, and Sync will (presumably) set the stall address and then wait for the GPU to catch up and get to the End of display list command.
I'm not 100% sure if it's the fastest way, but the easiest way I've found to get good data to the Gu without having to writeback the entire cache is to "uncache" my pointers before writing vertex data, that is OR-ing them by 0x40000000. This will disable caching of writes through it and make it write directly to real RAM.
By the way if you want optimal performance with textures, swizzle them (if you are going to draw them at any angle other than 0) and put them in VRAM. The speed difference between linear textures in RAM and swizzled textures in VRAM seem to be orders of magnitude, something like 100x!! Swizzled texture in RAM appear to have acceptable performance, though you really should put your most common textures in VRAM if you have room.
Swizzled textures (all formats) are stored as blocks of 16 bytes X 8 rows. (so 32-bit texture data is swizzled into 4x8 blocks).
By the way, no warranty is offered for the above info, which is the way I understand how things work which I've found out by my own experimentation and may be all wrong :)
It makes perfect sense to call Finish and then Sync because Finish will write an "End of display list" command to the list, and Sync will (presumably) set the stall address and then wait for the GPU to catch up and get to the End of display list command.
I'm not 100% sure if it's the fastest way, but the easiest way I've found to get good data to the Gu without having to writeback the entire cache is to "uncache" my pointers before writing vertex data, that is OR-ing them by 0x40000000. This will disable caching of writes through it and make it write directly to real RAM.
By the way if you want optimal performance with textures, swizzle them (if you are going to draw them at any angle other than 0) and put them in VRAM. The speed difference between linear textures in RAM and swizzled textures in VRAM seem to be orders of magnitude, something like 100x!! Swizzled texture in RAM appear to have acceptable performance, though you really should put your most common textures in VRAM if you have room.
Swizzled textures (all formats) are stored as blocks of 16 bytes X 8 rows. (so 32-bit texture data is swizzled into 4x8 blocks).
By the way, no warranty is offered for the above info, which is the way I understand how things work which I've found out by my own experimentation and may be all wrong :)
Do note however that callbacks are not currently enabled within pspgu, as I just get crashes if I add them in sceGuInit(). I'll revisit this & a lot of other stuff when I get back to work on this.crazyc wrote:http://svn.ps2dev.org/filedetails.php?r ... rev=0&sc=0turkeyman wrote:that should allow asyncrenous behaviour of CPU and GPU..
GE Dominator
It does. More exactly, sceGuGetMemory(), sceGuCallList(), sceGuDrawArray(), sceGuDrawArrayN(), sceGuFinish() and sceGuSignal() all updates the stall-address.I don't know if libGu does this implicitly anywhere, I keep things like that explicit in my library.
Actually, sceGuFinish() sets the stall-address, sceGuSync() only waits for the list to finish (default behaviour, haven't explored the rest). One approach I have been pondering would be to have two lists running as some kind of ring-buffer and then listening to interrupts to know when the screen can be flipped (wiring it together with a vbl-interrupt), which could get rid of the sceGuSync() as long as the rendering isn't fast enough. There are more advanced approaches than this, but I think it could be a problem fitting it into the pspgu-approach. :)It makes perfect sense to call Finish and then Sync because Finish will write an "End of display list" command to the list, and Sync will (presumably) set the stall address and then wait for the GPU to catch up and get to the End of display list command.
If there are any "uncached accelerated" modes available, I'd pick those (like there is on the PS2, and it kicks ASS when it comes to filling a lot of data in a sequential array), but I'm not getting my hopes up. :)I'm not 100% sure if it's the fastest way, but the easiest way I've found to get good data to the Gu without having to writeback the entire cache is to "uncache" my pointers before writing vertex data, that is OR-ing them by 0x40000000. This will disable caching of writes through it and make it write directly to real RAM.
Nice, seems sony has ditched the idea of having one swizzle-mode per pixel-size then... Good move.Swizzled textures (all formats) are stored as blocks of 16 bytes X 8 rows. (so 32-bit texture data is swizzled into 4x8 blocks).
GE Dominator
are you suggesting that 0x40000000 is not a valid cache bypass mode?chp wrote:If there are any "uncached accelerated" modes available, I'd pick those (like there is on the PS2, and it kicks ASS when it comes to filling a lot of data in a sequential array), but I'm not getting my hopes up. :)I'm not 100% sure if it's the fastest way, but the easiest way I've found to get good data to the Gu without having to writeback the entire cache is to "uncache" my pointers before writing vertex data, that is OR-ing them by 0x40000000. This will disable caching of writes through it and make it write directly to real RAM.
i'll use that for my immediate renderer, doing a dcache flush kinda sucks..
is there a way to insert immediate data directly into the main display list rather than having a separate immediate buffer and inserting calls into the main display list to reference it?
whats the story with double buffering display lists?
neither of you mentioned double buffering the display list its self.. are you using some kind of ring buffer technique? how are you stalling the CPU when it catched up to the GPU's display list pointer? or are you just syncing the display list every frame before the CPU starts writing to it?
Do you know the cached and uncached bus width to memory? i can write combine manually if i need to, provided the uncached write is the same width as the cache...0x40000000 is sure a valid cache bypass, the question is if it has write combiners so it would be as fast as than writing normally and then flushing...
What does the PS2 do? I'm pretty sure it has an 'uncached accelerated' mode yeah? Does that have write combining? and does it write more than 16 bytes out at a time? (PPro writes 32bytes at a time yeah?)
Even if 0x40000000 dosent have write combining, if i'm writing out 16 bytes at a time, its gotta be faster than trashing and then flushing the dcache every time i write out some vertex/texture data..