Hardware object culling on the GE

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Hardware object culling on the GE

Post by chp »

Ok, doing some thinking on some of the registers available and looking at how they are used in the GU library (and talking to some people on IRC, thanks Walkman and mICro) has exposed a field that has not yet been touched by the homebrew community, and that is culling objects directly on the GE.

Two functions are available today already for this in the GU library, namely sceGuBeginObject() and sceGuEndObject(). From educated guessing (this is all in theory yet, no real test yet) they test all vertices against the frustum bound to determine if the object passed in is actually visible. If it's not visible, the GE will jump across the gap to skip even considering rendering whatever was inside the conditional region. Best would probably be to pass in a bounding-box of some kind (if you use the 8/16-bit vertices you could do one box for all and let the model-transform scale it) and then use the real model inside. Example:

Code: Select all

sceGuBeginObject(GU_VERTEX_32BITF,8,0,model->boundingBox);
sceGuDrawArray(GU_TRIANGLES,GU_TEXTURE_32BITF|GU_VERTEX_32BITF|GU_TRANSFORM_3D,model->vertexCount,0,model->vertices);
sceGuEndObject();
Function-prototypes are as follows (until I get them properly documented in pspgu.h):

Code: Select all

void sceGuBeginObject(int vtype, int count, const void* indices, const void* vertices);
void sceGuEndObject(); 
Look at sceGuDrawArray() for explanations of the parameters.
GE Dominator
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

You mean like GL's occlusion test? That would be very neat.
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

jsgf wrote:You mean like GL's occlusion test? That would be very neat.
Not quite, since you cannot read back the result it seems (unless you do the query and then signal to get an interrupt reflecting that you did in fact execute that branch), but more like conditional rendering, since you can use the result to skip parts of the rendering. I can see this useful to precompute rendering for example big rooms where you can precalculate all bits of geometry and add their boundingboxes into the mix, and get free frustum culling as you turn around in the room. That is of course if this works properly. I need to do some kind of check so I'm not telling all lies now. :)
GE Dominator
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

Have you actually tested this yet, or is this still just supposition from looking at sceGuBeginObject?

What isn't clear to me is whether the test passes when the bbox geometry is simply clipped, or whether it actually (simulates) rasterising the bbox geometry and looks to see whether or not any fragments have passed the depth/stencil/scissor/etc tests.

All very interesting. I'm thinking about how to present this in a sane way in PSPGL...
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

I have now tested this (look further down in this post), and it works.

The geometry is clipped, since you do not pass in what type of primitive you test against, it could not know how to rasterize it.

On big strength of this that I see is that you could pre-record a scene and then just play it back culled and ready with no cpu-involvement. You could also use it to switch between two different lod-levels using only the GPU.

One thing to think about if you're doing the implementation is that while inside a conditional rendering, you do not know if a matrix has been flushed to the system. I got bitten by this while doing the tests, and debugged for 15 minutes until I realised that the matrix in question had been flushed by Gum inside the conditional region and didn't update again since it wasn't pushed (while the library thought it was), so make your stack culling-aware. :)

Here's a patch for the cube-sample that show hardware culling in action:

Code: Select all

Index: cube.c
===================================================================
--- cube.c	(revision 1707)
+++ cube.c	(working copy)
@@ -11,6 +11,7 @@
 #include <pspdebug.h>
 #include <stdlib.h>
 #include <stdio.h>
+#include <pspctrl.h>
 #include <math.h>
 #include <string.h>
 
@@ -20,7 +21,7 @@
 #include "../common/callbacks.h"
 #include "../common/vram.h"
 
-PSP_MODULE_INFO&#40;"Cube Sample", 0, 1, 1&#41;;
+PSP_MODULE_INFO&#40;"Hardware Culling Sample", 0, 1, 1&#41;;
 PSP_MAIN_THREAD_ATTR&#40;THREAD_ATTR_USER&#41;;
 
 static unsigned int __attribute__&#40;&#40;aligned&#40;16&#41;&#41;&#41; list&#91;262144&#93;;
@@ -122,17 +123,35 @@
 	sceDisplayWaitVblankStart&#40;&#41;;
 	sceGuDisplay&#40;GU_TRUE&#41;;
 
+	sceCtrlSetSamplingCycle&#40;0&#41;;
+	sceCtrlSetSamplingMode&#40;0&#41;;
+
 	// run sample
 
 	int val = 0;
+	float x = 0;
+	float y = 0;
 
 	while&#40;running&#40;&#41;&#41;
 	&#123;
 		sceGuStart&#40;GU_DIRECT,list&#41;;
 
+		SceCtrlData pad;
+		if &#40;sceCtrlPeekBufferPositive&#40;&pad,1&#41;&#41;
+		&#123;
+			if &#40;pad.Buttons & PSP_CTRL_UP&#41;
+				y += 0.1f;
+			if &#40;pad.Buttons & PSP_CTRL_DOWN&#41;
+				y -= 0.1f;
+			if &#40;pad.Buttons & PSP_CTRL_LEFT&#41;
+				x -= 0.1f;
+			if &#40;pad.Buttons & PSP_CTRL_RIGHT&#41;
+				x += 0.1f;
+		&#125;
+
 		// clear screen
 
-		sceGuClearColor&#40;0xff554433&#41;;
+		sceGuClearColor&#40;0x554400|&#40;128 + &#40;int&#41;&#40;cosf&#40;val * &#40;GU_PI/180.0f&#41;&#41; * 127.0f&#41;&#41;&#41;;
 		sceGuClearDepth&#40;0&#41;;
 		sceGuClear&#40;GU_COLOR_BUFFER_BIT|GU_DEPTH_BUFFER_BIT&#41;;
 
@@ -143,17 +162,38 @@
 		sceGumPerspective&#40;75.0f,16.0f/9.0f,0.5f,1000.0f&#41;;
 
 		sceGumMatrixMode&#40;GU_VIEW&#41;;
-		sceGumLoadIdentity&#40;&#41;;
-
-		sceGumMatrixMode&#40;GU_MODEL&#41;;
-		sceGumLoadIdentity&#40;&#41;;
 		&#123;
-			ScePspFVector3 pos = &#123; 0, 0, -2.5f &#125;;
+			ScePspFVector3 pos = &#123; x, y, -2.5f &#125;;
 			ScePspFVector3 rot = &#123; val * 0.79f * &#40;GU_PI/180.0f&#41;, val * 0.98f * &#40;GU_PI/180.0f&#41;, val * 1.32f * &#40;GU_PI/180.0f&#41; &#125;;
+			sceGumLoadIdentity&#40;&#41;;
 			sceGumTranslate&#40;&pos&#41;;
 			sceGumRotateXYZ&#40;&rot&#41;;
 		&#125;
 
+		sceGumMatrixMode&#40;GU_MODEL&#41;;
+		&#123;
+			ScePspFVector3 scale = &#123; 0.1f, 0.1f, 0.1f &#125;;
+			sceGumLoadIdentity&#40;&#41;;
+			sceGumScale&#40;&scale&#41;;
+		&#125;
+
+		&#123;
+			ScePspFVector3 scale = &#123; 0.3f, 0.3f, 0.3f &#125;;
+			sceGumMatrixMode&#40;GU_MODEL&#41;;
+			sceGumLoadIdentity&#40;&#41;;
+			sceGumScale&#40;&scale&#41;;
+		&#125;
+
+		// begin conditional rendering
+
+		sceGumUpdateMatrix&#40;&#41;; // note, not a sceGum*-function, so the matrices has to be flushed manually &#40;got bitten by that one &#58;&#41;&#41;
+		sceGuBeginObject&#40;GU_TEXTURE_32BITF|GU_COLOR_8888|GU_VERTEX_32BITF|GU_TRANSFORM_3D,12*3,0,vertices&#41;;
+
+		/// ALL BELOW THIS LINE IS ONLY RENDERED WHILE THE SMALLER CUBE IS INSIDE THE FRUSTUM
+
+		sceGumMatrixMode&#40;GU_MODEL&#41;;
+		sceGumLoadIdentity&#40;&#41;;
+
 		// setup texture
 
 		sceGuTexMode&#40;GU_PSM_4444,0,0,0&#41;;
@@ -169,6 +209,29 @@
 
 		sceGumDrawArray&#40;GU_TRIANGLES,GU_TEXTURE_32BITF|GU_COLOR_8888|GU_VERTEX_32BITF|GU_TRANSFORM_3D,12*3,0,vertices&#41;;
 
+		/// ALL ABOVE THIS LINE IS ONLY RENDERED WHILE THE SMALLER CUBE IS INSIDE THE FRUSTUM
+
+		// end conditional rendering
+
+		sceGuEndObject&#40;&#41;;
+
+		// draw line-cube to overlay &#40;just for showing bound&#41;
+		&#123;
+			ScePspFVector3 scale = &#123; 0.3f, 0.3f, 0.3f &#125;;
+			sceGumMatrixMode&#40;GU_MODEL&#41;;
+			sceGumLoadIdentity&#40;&#41;;
+			sceGumScale&#40;&scale&#41;;
+
+			sceGuEnable&#40;GU_LIGHTING&#41;;
+			sceGuModelColor&#40;0xffffff,0,0,0&#41;;
+			sceGuDisable&#40;GU_TEXTURE_2D&#41;;
+			sceGuDepthFunc&#40;GU_ALWAYS&#41;;
+			sceGumDrawArray&#40;GU_LINE_STRIP,GU_TEXTURE_32BITF|GU_COLOR_8888|GU_VERTEX_32BITF|GU_TRANSFORM_3D,12*3,0,vertices&#41;;
+			sceGuDepthFunc&#40;GU_GEQUAL&#41;;
+			sceGuEnable&#40;GU_TEXTURE_2D&#41;;
+			sceGuDisable&#40;GU_LIGHTING&#41;;
+		&#125;
+
 		sceGuFinish&#40;&#41;;
 		sceGuSync&#40;0,0&#41;;
 
Index&#58; Makefile.sample
===================================================================
--- Makefile.sample	&#40;revision 1707&#41;
+++ Makefile.sample	&#40;working copy&#41;
@@ -8,7 +8,7 @@
 
 LIBDIR =
 LDFLAGS =
-LIBS= -lpspgum -lpspgu -lm
+LIBS= -lpspgum -lpspgu -lm -lpspctrl
 
 EXTRA_TARGETS = EBOOT.PBP
 PSP_EBOOT_TITLE = Cube Sample
I test a smaller version of the cube against the frustum and then only render the big one if it is visible. Move the cube with the dpad.
GE Dominator
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

chp wrote:The geometry is clipped, since you do not pass in what type of primitive you test against, it could not know how to rasterize it.
Ah, good point. So does the test pass if all the vertices are visible, or just any of them?

I'll try it out shortly anyway.
On big strength of this that I see is that you could pre-record a scene and then just play it back culled and ready with no cpu-involvement. You could also use it to switch between two different lod-levels using only the GPU.
Hm. That's an interesting idea, but all your highres models are still using vertex buffer space. It would be useful for things which quickly move between the high and low res regions.
One thing to think about if you're doing the implementation is that while inside a conditional rendering, you do not know if a matrix has been flushed to the system.
Yes. I've been wresting with this point in trying to work out a sensible way of integrating this into GL API. If you have a conditional begin/end block like the GU interface, you need to work out some sensible definition of what happens with GL state changes within the block; which ones stick, and which ones are conditional. In a number of ways its related to the problem of GL display lists. Maybe this PSP feature is useful enough to justify implementing GL display lists...
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

chp wrote:On big strength of this that I see is that you could pre-record a scene and then just play it back culled and ready with no cpu-involvement. You could also use it to switch between two different lod-levels using only the GPU.
I'm not quite sure what you're thinking here. I can think of how to do it if you use a BBOX proxy for a particular LOD which is arranged such that the proxy is completely off the screen when a LOD gets too close, so you can cull that level and go to a higher one; there's no particular reason you couldn't do this for multiple levels. But getting the proxy to be invisible at the right times might be tricky...

What do you have in mind?
I test a smaller version of the cube against the frustum and then only render the big one if it is visible. Move the cube with the dpad.
To be precise, the test is between the BBOX and the edges of the scissor rectangle. I tried changing the viewport to be half the screen, and while drawn points were being clipped by the clip-planes, the BBOX test didn't fail unless the bbox cube was entirely off the screen. It seems the clipping test is not applied to the bbox vertices.
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

jsgf wrote:
chp wrote:On big strength of this that I see is that you could pre-record a scene and then just play it back culled and ready with no cpu-involvement. You could also use it to switch between two different lod-levels using only the GPU.
I'm not quite sure what you're thinking here. I can think of how to do it if you use a BBOX proxy for a particular LOD which is arranged such that the proxy is completely off the screen when a LOD gets too close, so you can cull that level and go to a higher one; there's no particular reason you couldn't do this for multiple levels. But getting the proxy to be invisible at the right times might be tricky...

What do you have in mind?
What I was thinking of was not something that GU supports out of the box, but to support switching between LODs (and as you say, you can drop between more than one LOD) by using the conditional to test several camera-aligned bounding-boxes placed in front of the camera, and when one box is rejected you go to the next and test for a higher lod-level. This could also allow you to bypass the issues with clipping since the more highly tesselated models can support being watched up close (since their triangles won't be rejected until they are outside the view). It would however break how GU works today, since it would be more layed out as a if(...) {} else if(...) {} ... sentence unlike the if (...) {} that it is designed like now.
jsgf wrote:
I test a smaller version of the cube against the frustum and then only render the big one if it is visible. Move the cube with the dpad.
To be precise, the test is between the BBOX and the edges of the scissor rectangle. I tried changing the viewport to be half the screen, and while drawn points were being clipped by the clip-planes, the BBOX test didn't fail unless the bbox cube was entirely off the screen. It seems the clipping test is not applied to the bbox vertices.
Ah, interesting, so they do it in post projected space then... I wonder how it reacts to the near and far plane, if these are rejected properly or if they are as broken as the clipping. I'll have to try it out.
GE Dominator
Post Reply