libvfpu: simple VFPU context switching

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

libvfpu: simple VFPU context switching

Post by jsgf »

I just checked in a first cut of libvfpu (pspsdk/src/vfpu). The idea is that this is a simple library which allows multiple users of the VFPU within one program without having them stomp on each other.

A simple example of the usage:

Code: Select all

vfpucontext = pspvfpu_initcontext(VMAT4 | VMAT5); // create a context, and start using matrices 4&5
// later on...
call_some_other_VFPU_using_library(...);
//...
pspvfpu_usecontext(vfpucontext, VMAT4|VMAT5|VMAT7); // start using 7 too
asm("vmul.q c700,c500,c500");
// ...
pspvfpu_deletecontext(vfpucontext);
You would call pspvfpu_usecontext() just before any block of VFPU-using code where someone else's VFPU-using code might have run since you last used the VFPU.

This means that if two libraries both want to use VFPU matrix 4, they each see their own copy of it, rather than stomping on each other's.

If only one library uses, say, matrix 7, then its context remains in the VFPU register set, and libvfpu does nothing.

Also, pspvfpu_initcontext also sets the thread's VFPU attribute, so you don't need to worry about setting it at thread creation time.

This code is only very lightly tested. I'm checking it in mainly as a call for comments.
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

Nice! Shouldn't be too hard to get this into GUM. The more apps that can happily co-exist the better. :)
GE Dominator
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

Ah, one thing I can think of that should be considered is how you can declare a temporary usage of a matrix, instead of persistant usage. For example GUM uses matrix 0, 1 and 2 as temporaries, but using this interface I cannot flag them without having them stored when it switches context.
GE Dominator
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

chp wrote:Ah, one thing I can think of that should be considered is how you can declare a temporary usage of a matrix, instead of persistant usage. For example GUM uses matrix 0, 1 and 2 as temporaries, but using this interface I cannot flag them without having them stored when it switches context.
That should be simple to do. I thought about it, but I didn't want to complicate the API (not that it would, much). Something like

Code: Select all

void pspvfpu_discardmats(struct pspvfpu_context *c, unsigned matrixset)
, to be called once you've finished with a matrix.

The other thing we could consider is defining an ABI for the VFPU, with conventions something like (roughly):
  • Matrix 0 - scratch, never preserved or restored
  • Matrix 1 - VFPU function first arg(?) and return
  • Matrix 2+3 - VFPU function args 2,3
  • Matrix 4-7 - general use, must be context switched by libvfpu
This would allow general purpose VFPU functions which could be used in a number of situations, so everyone wanting to use the VFPU doesn't need to roll their own inline assembler, and doesn't need to keep transferring stuff into and out of either FP registers or memory.

There's at least one problem with converting libgum_vfpu to use libvfpu; you need to call pspvfpu_createcontext() in every thread which starts using libgum as part of the initializer. This requires testing in every gum function whether the context has been created for this thread, and doing it if not.

The real problem is the lack of any kind of thread-local storage model. I think the ideal API would require the creation of a single VFPU context for a particular user/library, which would internally have thread-local VFPU register state so that you can have multithreaded VFPU users.

But of course, libgum (and PSPGL) are strictly single-threaded anyway, so there isn't much practical problem yet. (Though using the VFPU would mean that they're really single-threaded - they must be used from a particular thread, rather than simple wrapping them in a mutex and only allowing one thread at a time in the code).
TyRaNiD
Posts: 907
Joined: Sun Jan 18, 2004 12:23 am

Post by TyRaNiD »

Perhaps I am missing the point somewhat, why exactly is this library necessary? I could understand perhaps general purpose functions to save matrices before using them in your own functions and then restoring them afterwards (so a library can operate independantly without affecting any other vfpu code in the program) however the thread manager should be handling vfpu context switching when a different thread is running, which I guess is the whole point of the VFPU thread attribute.

edit: Well I guess it could be benefical to leave any vfpu matrices in registers as long as possible but still :)
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

TyRaNiD wrote:edit: Well I guess it could be benefical to leave any vfpu matrices in registers as long as possible but still :)
Precisely. If you have a single user of a particular matrix, there's no point in saving and restoring it all the time. It something you want to do lazily as little as possible.

Both libgum and PSPGL want to store the top of the current matrix stack in VFPU register state all the time, rather than keep reloading from memory. GUM uses matrix 3 and I'm using matrix 7 in the PSPGL, just so that there's no immediate conflict. But ODE and other such libraries are prime candidates for VFPU use, and given how simple and powerful the VFPU is turning out, I think there will be (or at least should be) a lot of VFPU users.


They basically have 4 options:
  1. Restore and save their VFPU state unconditionally on every use
  2. Try to choose disjoint sets of VFPU registers and hope there are no conflicts
  3. Use a library to manage the VFPU state and multiplex multiple contexts onto a single set of VFPU registers
  4. Put each VFPU user in its own distinct thread.
Option 3 seems like the best shot to me. 1 would only make sense if the VFPU were continiously thrashing between two contexts which actually used the same set of matricies; in that case libvfpu would have a small amount of overhead in trying to optimise the unoptimisable. But I think that case would be relatively rare. Option 4 doesn't seem good, since I gather context-switching between VFPU-using threads is pretty expensive, so you'd actually want to minimize the the number of VFPU-using threads (for example, constrain it to just one of your application threads).
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

chp wrote:Ah, one thing I can think of that should be considered is how you can declare a temporary usage of a matrix, instead of persistant usage. For example GUM uses matrix 0, 1 and 2 as temporaries, but using this interface I cannot flag them without having them stored when it switches context.
OK, I revised the API a bit. When you want to use the VFPU, you now call:

Code: Select all

pspvfpu_use_matrices(context, VMAT0|VMAT1, VMAT2|VMAT3);
This says that you want to use matrices 0-3, but you only care about 0+1 in the long term, and 2+3 are only temporaries. This means that it will save anyone else's long-term use of 0-3, but it won't bother saving or restoring 2+3 to/from your context (but they're available for your use now).

If you were using a matrix as a long-term value, but now you want to discard it, you'd do something like:

Code: Select all

pspvfpu_use_matrices(context, VMAT0|VMAT1, 0);
// use VMAT0 and 1
pspvfpu_use_matrices(context, VMAT0, VMAT1); // discard VMAT1
The next time you use VMAT1 (either as a temp or a keeper), it will have an undefined value, depending on who used it in the meantime.
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

One more small change to the API: I've made it valid to use pspvfpu_use_matrices() with a NULL context. This means you just want to use a matrix as a temporary, and you have no use for long-term matrices.

This makes it easy for the gum* functions to grab some temp matrices, while the sceGum* functions still use matrix 3 as the persistent top of the current matrix stack.

Also, since the content of a sceGum* matrix stack is undefined until you either use sceGumLoadMatrix or LoadIdentity, those are the only two functions which will bother allocating a context. The other functions will still work with a NULL context, but be basically meaningless until you properly initialize the matrix stack.
Post Reply