VFPU Load

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
su-27k
Posts: 12
Joined: Thu Nov 16, 2006 8:31 pm

VFPU Load

Post by su-27k »

hi everyone and sorry for my english,
the vfpu supports the following data formats:32bit integer,32bit floating point,packed 8 bit integer,packed 16 bit integer/floating point,end so i try to load s8[4] in a register,but i dont know who is the Constraints of this type after i load it in a registerVFPU how can i read them if they are stay in a only register.
i think that when you load a register,vfpu bind the type by constraints,ex:

s8[4] a;
__asm__(
"lv.s s000, %0"
:
:"r"(a)
);
in other words where can i found the Constraints for allegrex,i dont know them for this particular machine:r-mean register?m-mean memory?o-mean object?f-mean floating point?i-mean integer?and for s8[4] which is?etctetctetc

write me soon :)
su-27k
Posts: 12
Joined: Thu Nov 16, 2006 8:31 pm

Post by su-27k »

in this link you can find all particular machine costraint http://gcc.gnu.org/onlinedocs/gcc-4.1.1 ... onstraints
but for psp,where is the documentation? -_-'
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Re: VFPU Load

Post by hlide »

su-27k wrote:in other words where can i found the Constraints for allegrex,i dont know them for this particular machine:r-mean register?m-mean memory?o-mean object?f-mean floating point?i-mean integer?and for s8[4] which is?etctetctetc
AFAIK there is no real constraints for VFPU registers. You are forced to name them explicitely in asm strings. It sounds as if gcc doesn't know their exitence and cannot allocate them for us or make them clobbers as you can do with MMX/SSE registers. The fact that there is no builtin intrinsics for VFPU instructions may be another indication that GCC doesn't know their existence. I guess the fact that VFPU registers are very tricky to use with their multiple usages (Matrix/Vector/Scalar-like registers) and doesn't help GCC to handle them nicely.

the answer should be : ..."lv.s S000, 0 + %0" : : "m"(a)...

EDIT:
there is no real point to use VFPU for anthing else float.
su-27k
Posts: 12
Joined: Thu Nov 16, 2006 8:31 pm

Re: VFPU Load

Post by su-27k »

so you think that vfpu dont' bind the type by constraints,but may be cheked bit arrangemet of register and undrstand what type it have downloaded,however is very strange because in this case,how can it understand the difference between integer and s8[4] for exemple?i think there is an aswer and i wnat know it !!!!
for now i use "ext" of allegrex to extract bit field,but i want do anythink with cooprocessor VFPU :=P
however with gcc-psp i can use with succes m-o-r,with f complie but dont link.
i try to load a float with r and at run time i have an exception adress not aligned,bhaaaa

documentatiooooooooooooooooooooooon.....................
hlide wrote:
su-27k wrote:in other words where can i found the Constraints for allegrex,i dont know them for this particular machine:r-mean register?m-mean memory?o-mean object?f-mean floating point?i-mean integer?and for s8[4] which is?etctetctetc
AFAIK there is no real constraints for VFPU registers. You are forced to name them explicitely in asm strings. It sounds as if gcc doesn't know their exitence and cannot allocate them for us or make them clobbers as you can do with MMX/SSE registers. The fact that there is no builtin intrinsics for VFPU instructions may be another indication that GCC doesn't know their existence. I guess the fact that VFPU registers are very tricky to use with their multiple usages (Matrix/Vector/Scalar-like registers) and doesn't help GCC to handle them nicely.

the answer should be : ..."lv.s S000, 0 + %0" : : "m"(a)...

EDIT:
there is no real point to use VFPU for anthing else float.
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

vfpu load/store doesn't work with "r" constraint, because the vfpu cannot access CPU registers directly. You need to use the "m" constraint as hlide mentioned. If you want to copy the contents of a cpu register to a vector register, use mtv/mfv ops
<Don't push the river, it flows.>
http://wordpress.fx-world.org - my devblog
http://wiki.fx-world.org - VFPU documentation wiki

Alexander Berl
su-27k
Posts: 12
Joined: Thu Nov 16, 2006 8:31 pm

Post by su-27k »

if you have a variabile in register cpu,like a pointer to a vector,you can use itlike this

void asd(float *a)

__asd___(
"lv.q r000,0+%0\n"

:
:"r"(a)
);

because "a" is on the stack and when you call "asd",the contents of stack is load in regsiter cpu ;)
Raphael wrote:vfpu load/store doesn't work with "r" constraint, because the vfpu cannot access CPU registers directly. You need to use the "m" constraint as hlide mentioned. If you want to copy the contents of a cpu register to a vector register, use mtv/mfv ops
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

su-27k wrote:if you have a variabile in register cpu,like a pointer to a vector,you can use itlike this

void asd(float *a)

__asd___(
"lv.q r000,0+%0\n"

:
:"r"(a)
);

because "a" is on the stack and when you call "asd",the contents of stack is load in regsiter cpu ;)
But that's a difference to your first code snippet.
Also, regarding your question where documentation is: there is none psp-specific. On the link you gave, you can take the MIPS constraints as documentation though, since PSP uses a MIPS Cpu. I doubt there are any special constraints for psp
<Don't push the river, it flows.>
http://wordpress.fx-world.org - my devblog
http://wiki.fx-world.org - VFPU documentation wiki

Alexander Berl
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

su-27k wrote:if you have a variabile in register cpu,like a pointer to a vector,you can use itlike this

void asd(float *a)

__asd___(
"lv.q r000,0+%0\n"

:
:"r"(a)
);

because "a" is on the stack and when you call "asd",the contents of stack is load in regsiter cpu ;)
Raphael wrote:vfpu load/store doesn't work with "r" constraint, because the vfpu cannot access CPU registers directly. You need to use the "m" constraint as hlide mentioned. If you want to copy the contents of a cpu register to a vector register, use mtv/mfv ops
you know what ? just have a look on psp-gcc source code and you'll be pretty convinced there is no vfpu-specific constraints.

The constraint you are using is a general register constraint on OPERAND (NOT ON OPCODE !). It happens here that you are using in fact a disguised "lwc2" instruction which is also a mips coprocessor 2 instruction. So gcc can handle this operand because it recognizes it somewhat as a "lwc2" operand and not as "lv.s/.q" operand.

EDIT:
oh well I'm must be tired. Sorry, su-27k, my tone should have been too harsh :(((
su-27k
Posts: 12
Joined: Thu Nov 16, 2006 8:31 pm

Post by su-27k »

hlide wrote:
su-27k wrote:if you have a variabile in register cpu,like a pointer to a vector,you can use itlike this

void asd(float *a)

__asd___(
"lv.q r000,0+%0\n"

:
:"r"(a)
);

because "a" is on the stack and when you call "asd",the contents of stack is load in regsiter cpu ;)
Raphael wrote:vfpu load/store doesn't work with "r" constraint, because the vfpu cannot access CPU registers directly. You need to use the "m" constraint as hlide mentioned. If you want to copy the contents of a cpu register to a vector register, use mtv/mfv ops
you know what ? just have a look on psp-gcc source code and you'll be pretty convinced there is no vfpu-specific constraints.

The constraint you are using is a general register constraint on OPERAND (NOT ON OPCODE !). It happens here that you are using in fact a disguised "lwc2" instruction which is also a mips coprocessor 2 instruction. So gcc can handle this operand because it recognizes it somewhat as a "lwc2" operand and not as "lv.s/.q" operand.

EDIT:
oh well I'm must be tired. Sorry, su-27k, my tone should have been too harsh :(((
hi :)
my english is very bad and maybe you dont undrstand me,i say that the following asm code load in the same way

void asd(float value){
__asm__(
"mtv %0,S000 \n"

:
:"r"(value)
);
}

void asd(float value){
__asm__(
"lv.s S000, %0 \n"

:
:"r"(value)
);
}

both load from a0 regster of cpu,because in risc cpu,when you call a function the parameters are load in register of cpu,in cisc this is false because they use stack to do this.
vfpu can read from memory only through cpu by 128bit bus and can direct write in memory by a buffer 128x8.
right?
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

su-27k wrote: hi :)
my english is very bad and maybe you dont undrstand me,i say that the following asm code load in the same way

void asd(float value){
__asm__(
"mtv %0,S000 \n"

:
:"r"(value)
);
}

void asd(float value){
__asm__(
"lv.s S000, %0 \n"

:
:"r"(value)
);
}

both load from a0 regster of cpu,because in risc cpu,when you call a function the parameters are load in register of cpu,in cisc this is false because they use stack to do this.
vfpu can read from memory only through cpu by 128bit bus and can direct write in memory by a buffer 128x8.
right?
No, the second code won't even compile! (expression too complex)
You cannot load from registers with lv* ops.
You could however do this:

Code: Select all

void asd&#40;float value&#41;&#123;
	__asm__&#40;
	     "lv.s		S000,	&#40;%0&#41;		\n"
	     &#58;
	     &#58;"r"&#40;&value&#41;
	&#41;;
&#125;
But that's stupid. Use the "m" constraint for lv accesses.
<Don't push the river, it flows.>
http://wordpress.fx-world.org - my devblog
http://wiki.fx-world.org - VFPU documentation wiki

Alexander Berl
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

Raphael wrote:

Code: Select all

void asd&#40;float value&#41;&#123;
	__asm__&#40;
	     "lv.s		S000,	&#40;%0&#41;		\n"
	     &#58;
	     &#58;"r"&#40;&value&#41;
	&#41;;
&#125;
But that's stupid. Use the "m" constraint for lv accesses.

Code: Select all

void asd&#40;float value&#41;&#123;
        register int x;
	__asm__&#40;
	     "mfc1 %0, %1\n"
	     "mtv %0, S000\n"
	     &#58; 
	     &#58;"r"&#40;x&#41;, "f"&#40;value&#41;
	&#41;;
&#125;
or

Code: Select all

void asd&#40;float value&#41;&#123;
        float x;
	__asm__&#40;
	     "swc1 %0, 0+%1\n"
	     "lv.s S000, 0+%1\n"
	     &#58; 
	     &#58; "f"&#40;value&#41;, "m"&#40;x&#41;
	&#41;;
&#125;
should be okay too... but I dunno which of three is faster.
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

hlide wrote:but I dunno which of three is faster.
I'd guess his first solution:

Code: Select all

void asd&#40;float value&#41;&#123;
__asm__&#40;
"mtv %0,S000 \n"
&#58;
&#58;"r"&#40;value&#41;
&#41;;
&#125;
which is the straightforward way to load a register into a vector register. I'd guess that would only take a few cycles (1?), since it's basically register->register move. The others all suffer from memory access.
<Don't push the river, it flows.>
http://wordpress.fx-world.org - my devblog
http://wiki.fx-world.org - VFPU documentation wiki

Alexander Berl
su-27k
Posts: 12
Joined: Thu Nov 16, 2006 8:31 pm

Post by su-27k »

No, the second code won't even compile! (expression too complex)
You cannot load from registers with lv* ops.
You could however do this:

Code: Select all

void asd&#40;float value&#41;&#123;
	__asm__&#40;
	     "lv.s		S000,	&#40;%0&#41;		\n"
	     &#58;
	     &#58;"r"&#40;&value&#41;
	&#41;;
&#125;
But that's stupid. Use the "m" constraint for lv accesses.
yes it is ok,sorry but i'm working and i'm going fast :9
however this code crash at run time because "value" havent a pointer,rememeber that it's in cpu register not in memory.
your code after compile it's like this

Code: Select all

	    lv.s		S000,0x0&#40;a0&#41;	
where in a0 there is value not pointer memory to "value".
this the best way to load a parameters,i'm agree with rafael

Code: Select all

void asd&#40;float value&#41;&#123;
__asm__&#40;
"mtv %0,S000 \n"
&#58;
&#58;"r"&#40;value&#41;
&#41;;
&#125; 
%0 is a0 register of cpu.

byeeee
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

Raphael wrote:
hlide wrote:but I dunno which of three is faster.
I'd guess his first solution:

Code: Select all

void asd&#40;float value&#41;&#123;
__asm__&#40;
"mtv %0,S000 \n"
&#58;
&#58;"r"&#40;value&#41;
&#41;;
&#125;
which is the straightforward way to load a register into a vector register. I'd guess that would only take a few cycles (1?), since it's basically register->register move. The others all suffer from memory access.
i'm not sure to understand here. Take this example :

Code: Select all

float f&#40;float a, float b&#41;
&#123;
  return a + b;
&#125;
when compiled we get :

Code: Select all

f&#58;
  jr $ra
  add.s $f0, $f12, $f13
so there is no way you can have a 1-cyle move register to register here

you need to use either a memory or a general purpose register to move a FPU register to VFPU scalar register and vice versa.

So if you want to get a float after a VFPU scalar computational, you need to do at least :

Code: Select all

mfv $v0, S000
jr $ra
mtc1 $v0, $f0
or using memory.
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

hlide wrote: i'm not sure to understand here.
I was just refering to the single instruction that copys the register content, ie only the "mtv %0, s000\n" vs. your versions with two ops each which also use memory access (slow unless in cache) to load the float into the s000 register. So the single mtv should always be faster.
But you're right, it prolly won't be one cycle still.
<Don't push the river, it flows.>
http://wordpress.fx-world.org - my devblog
http://wiki.fx-world.org - VFPU documentation wiki

Alexander Berl
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

Raphael wrote:
hlide wrote: i'm not sure to understand here.
I was just refering to the single instruction that copys the register content, ie only the "mtv %0, s000\n" vs. your versions with two ops each which also use memory access (slow unless in cache) to load the float into the s000 register. So the single mtv should always be faster.
But you're right, it prolly won't be one cycle still.
still, when I see this, i'm bothered :

Code: Select all

void asd&#40;float value&#41;&#123;
__asm__&#40;
"mtv %0,S000 \n"
&#58;
&#58;"r"&#40;value&#41;
&#41;;
&#125;
by agreeing with that, you seemed to imply that "float value" is in fact a GPR register ($a0) as stated by su-47k after this code ("%0 is a0 register of cpu".) and not in a FPU register ($f12). This is what bothered me.

If i compile that, I get this :

Code: Select all

asd&#58;
  mfc1	$v0,$f12
  mtv	$v0, S000.s
  jr	ra
  nop
So that couldn't be a 1-cycle move between a fpu register and vfpu register even with that version.

So what i proposed gives the same code except that i dont' HIDE the fact you still need two instructions at least to move a FPU register to vfpu register :

Code: Select all

void asd&#40;float value&#41;
&#123;
   register int x;
   __asm__&#40;
      "mfc1 %0, %1\n"
      "mtv %0, S000\n"
      &#58;
      &#58;"r"&#40;x&#41;, "f"&#40;value&#41;
   &#41;;
&#125; 
but yeah i know this a very futile detail but that can help to understand why there are things you can do and other things you can't do because gcc sometimes inserts some hidden instructions.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

basically, what I'm saying is that FPU<->VFPU "conversion" are not cheaper than integer conversions, so you must think twice before using it : can I avoid to mix them at all until I need to store them in memory ? this is my conclusion.
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

Ah, ok, now I got your point :) Didn't even think about the fact that 'value' is in a fpu register. So you're right. Still memory reads would be slower I suppose.
<Don't push the river, it flows.>
http://wordpress.fx-world.org - my devblog
http://wiki.fx-world.org - VFPU documentation wiki

Alexander Berl
Post Reply