VFPU Load

su-27k · Post by **su-27k** » Fri Nov 17, 2006 12:17 am

hi everyone and sorry for my english,
the vfpu supports the following data formats:32bit integer,32bit floating point,packed 8 bit integer,packed 16 bit integer/floating point,end so i try to load s8[4] in a register,but i dont know who is the Constraints of this type after i load it in a registerVFPU how can i read them if they are stay in a only register.
i think that when you load a register,vfpu bind the type by constraints,ex:

s8[4] a;
__asm__(
"lv.s s000, %0"
:
:"r"(a)
);
in other words where can i found the Constraints for allegrex,i dont know them for this particular machine:r-mean register?m-mean memory?o-mean object?f-mean floating point?i-mean integer?and for s8[4] which is?etctetctetc

write me soon :)

su-27k · Post by **su-27k** » Fri Nov 17, 2006 12:21 am

in this link you can find all particular machine costraint http://gcc.gnu.org/onlinedocs/gcc-4.1.1 ... onstraints
but for psp,where is the documentation? -_-'

hlide · Post by **hlide** » Fri Nov 17, 2006 1:05 am

su-27k wrote:in other words where can i found the Constraints for allegrex,i dont know them for this particular machine:r-mean register?m-mean memory?o-mean object?f-mean floating point?i-mean integer?and for s8[4] which is?etctetctetc

AFAIK there is no real constraints for VFPU registers. You are forced to name them explicitely in asm strings. It sounds as if gcc doesn't know their exitence and cannot allocate them for us or make them clobbers as you can do with MMX/SSE registers. The fact that there is no builtin intrinsics for VFPU instructions may be another indication that GCC doesn't know their existence. I guess the fact that VFPU registers are very tricky to use with their multiple usages (Matrix/Vector/Scalar-like registers) and doesn't help GCC to handle them nicely.

the answer should be : ..."lv.s S000, 0 + %0" : : "m"(a)...

EDIT:
there is no real point to use VFPU for anthing else float.

su-27k · Post by **su-27k** » Fri Nov 17, 2006 2:49 am

so you think that vfpu dont' bind the type by constraints,but may be cheked bit arrangemet of register and undrstand what type it have downloaded,however is very strange because in this case,how can it understand the difference between integer and s8[4] for exemple?i think there is an aswer and i wnat know it !!!!
for now i use "ext" of allegrex to extract bit field,but i want do anythink with cooprocessor VFPU :=P
however with gcc-psp i can use with succes m-o-r,with f complie but dont link.
i try to load a float with r and at run time i have an exception adress not aligned,bhaaaa

documentatiooooooooooooooooooooooon.....................

hlide wrote:
su-27k wrote:in other words where can i found the Constraints for allegrex,i dont know them for this particular machine:r-mean register?m-mean memory?o-mean object?f-mean floating point?i-mean integer?and for s8[4] which is?etctetctetc
AFAIK there is no real constraints for VFPU registers. You are forced to name them explicitely in asm strings. It sounds as if gcc doesn't know their exitence and cannot allocate them for us or make them clobbers as you can do with MMX/SSE registers. The fact that there is no builtin intrinsics for VFPU instructions may be another indication that GCC doesn't know their existence. I guess the fact that VFPU registers are very tricky to use with their multiple usages (Matrix/Vector/Scalar-like registers) and doesn't help GCC to handle them nicely.

the answer should be : ..."lv.s S000, 0 + %0" : : "m"(a)...

EDIT:
there is no real point to use VFPU for anthing else float.

Raphael · Post by **Raphael** » Fri Nov 17, 2006 3:02 am

vfpu load/store doesn't work with "r" constraint, because the vfpu cannot access CPU registers directly. You need to use the "m" constraint as hlide mentioned. If you want to copy the contents of a cpu register to a vector register, use mtv/mfv ops

su-27k · Post by **su-27k** » Fri Nov 17, 2006 3:27 am

if you have a variabile in register cpu,like a pointer to a vector,you can use itlike this

void asd(float *a)

__asd___(
"lv.q r000,0+%0\n"

:
:"r"(a)
);

because "a" is on the stack and when you call "asd",the contents of stack is load in regsiter cpu ;)

Raphael wrote:vfpu load/store doesn't work with "r" constraint, because the vfpu cannot access CPU registers directly. You need to use the "m" constraint as hlide mentioned. If you want to copy the contents of a cpu register to a vector register, use mtv/mfv ops

Raphael · Post by **Raphael** » Fri Nov 17, 2006 3:51 am

su-27k wrote:if you have a variabile in register cpu,like a pointer to a vector,you can use itlike this

void asd(float *a)

__asd___(
"lv.q r000,0+%0\n"

:
:"r"(a)
);

because "a" is on the stack and when you call "asd",the contents of stack is load in regsiter cpu ;)

But that's a difference to your first code snippet.
Also, regarding your question where documentation is: there is none psp-specific. On the link you gave, you can take the MIPS constraints as documentation though, since PSP uses a MIPS Cpu. I doubt there are any special constraints for psp

hlide · Post by **hlide** » Fri Nov 17, 2006 4:50 am

su-27k wrote:if you have a variabile in register cpu,like a pointer to a vector,you can use itlike this

void asd(float *a)

__asd___(
"lv.q r000,0+%0\n"

:
:"r"(a)
);

because "a" is on the stack and when you call "asd",the contents of stack is load in regsiter cpu ;)

Raphael wrote:vfpu load/store doesn't work with "r" constraint, because the vfpu cannot access CPU registers directly. You need to use the "m" constraint as hlide mentioned. If you want to copy the contents of a cpu register to a vector register, use mtv/mfv ops

you know what ? just have a look on psp-gcc source code and you'll be pretty convinced there is no vfpu-specific constraints.

The constraint you are using is a general register constraint on OPERAND (NOT ON OPCODE !). It happens here that you are using in fact a disguised "lwc2" instruction which is also a mips coprocessor 2 instruction. So gcc can handle this operand because it recognizes it somewhat as a "lwc2" operand and not as "lv.s/.q" operand.

EDIT:
oh well I'm must be tired. Sorry, su-27k, my tone should have been too harsh :(((

su-27k · Post by **su-27k** » Fri Nov 17, 2006 8:48 pm

hlide wrote:
su-27k wrote:if you have a variabile in register cpu,like a pointer to a vector,you can use itlike this

void asd(float *a)

__asd___(
"lv.q r000,0+%0\n"

:
:"r"(a)
);

because "a" is on the stack and when you call "asd",the contents of stack is load in regsiter cpu ;)

Raphael wrote:vfpu load/store doesn't work with "r" constraint, because the vfpu cannot access CPU registers directly. You need to use the "m" constraint as hlide mentioned. If you want to copy the contents of a cpu register to a vector register, use mtv/mfv ops
you know what ? just have a look on psp-gcc source code and you'll be pretty convinced there is no vfpu-specific constraints.

The constraint you are using is a general register constraint on OPERAND (NOT ON OPCODE !). It happens here that you are using in fact a disguised "lwc2" instruction which is also a mips coprocessor 2 instruction. So gcc can handle this operand because it recognizes it somewhat as a "lwc2" operand and not as "lv.s/.q" operand.

EDIT:
oh well I'm must be tired. Sorry, su-27k, my tone should have been too harsh :(((

hi :)
my english is very bad and maybe you dont undrstand me,i say that the following asm code load in the same way

void asd(float value){
__asm__(
"mtv %0,S000 \n"

:
:"r"(value)
);
}

void asd(float value){
__asm__(
"lv.s S000, %0 \n"

:
:"r"(value)
);
}

both load from a0 regster of cpu,because in risc cpu,when you call a function the parameters are load in register of cpu,in cisc this is false because they use stack to do this.
vfpu can read from memory only through cpu by 128bit bus and can direct write in memory by a buffer 128x8.
right?

Raphael · Post by **Raphael** » Fri Nov 17, 2006 9:20 pm

su-27k wrote: hi :)
my english is very bad and maybe you dont undrstand me,i say that the following asm code load in the same way

void asd(float value){
__asm__(
"mtv %0,S000 \n"

:
:"r"(value)
);
}

void asd(float value){
__asm__(
"lv.s S000, %0 \n"

:
:"r"(value)
);
}

both load from a0 regster of cpu,because in risc cpu,when you call a function the parameters are load in register of cpu,in cisc this is false because they use stack to do this.
vfpu can read from memory only through cpu by 128bit bus and can direct write in memory by a buffer 128x8.
right?

No, the second code won't even compile! (expression too complex)
You cannot load from registers with lv* ops.
You could however do this:

Code: Select all

void asd&#40;float value&#41;&#123;
	__asm__&#40;
	     "lv.s		S000,	&#40;%0&#41;		\n"
	     &#58;
	     &#58;"r"&#40;&value&#41;
	&#41;;
&#125;

But that's stupid. Use the "m" constraint for lv accesses.

hlide · Post by **hlide** » Fri Nov 17, 2006 10:10 pm

Raphael wrote:
Code: Select all
void asd&#40;float value&#41;&#123;
	__asm__&#40;
	     "lv.s		S000,	&#40;%0&#41;		\n"
	     &#58;
	     &#58;"r"&#40;&value&#41;
	&#41;;
&#125;
But that's stupid. Use the "m" constraint for lv accesses.

Code: Select all

void asd&#40;float value&#41;&#123;
        register int x;
	__asm__&#40;
	     "mfc1 %0, %1\n"
	     "mtv %0, S000\n"
	     &#58; 
	     &#58;"r"&#40;x&#41;, "f"&#40;value&#41;
	&#41;;
&#125;

or

Code: Select all

void asd&#40;float value&#41;&#123;
        float x;
	__asm__&#40;
	     "swc1 %0, 0+%1\n"
	     "lv.s S000, 0+%1\n"
	     &#58; 
	     &#58; "f"&#40;value&#41;, "m"&#40;x&#41;
	&#41;;
&#125;

should be okay too... but I dunno which of three is faster.

Raphael · Post by **Raphael** » Fri Nov 17, 2006 11:10 pm

hlide wrote:but I dunno which of three is faster.

I'd guess his first solution:

Code: Select all

void asd&#40;float value&#41;&#123;
__asm__&#40;
"mtv %0,S000 \n"
&#58;
&#58;"r"&#40;value&#41;
&#41;;
&#125;

which is the straightforward way to load a register into a vector register. I'd guess that would only take a few cycles (1?), since it's basically register->register move. The others all suffer from memory access.

su-27k · Post by **su-27k** » Fri Nov 17, 2006 11:31 pm

No, the second code won't even compile! (expression too complex)
You cannot load from registers with lv* ops.
You could however do this:
Code: Select all
void asd&#40;float value&#41;&#123;
	__asm__&#40;
	     "lv.s		S000,	&#40;%0&#41;		\n"
	     &#58;
	     &#58;"r"&#40;&value&#41;
	&#41;;
&#125;
But that's stupid. Use the "m" constraint for lv accesses.

yes it is ok,sorry but i'm working and i'm going fast :9
however this code crash at run time because "value" havent a pointer,rememeber that it's in cpu register not in memory.
your code after compile it's like this

Code: Select all

	    lv.s		S000,0x0&#40;a0&#41;

where in a0 there is value not pointer memory to "value".
this the best way to load a parameters,i'm agree with rafael

Code: Select all

void asd&#40;float value&#41;&#123;
__asm__&#40;
"mtv %0,S000 \n"
&#58;
&#58;"r"&#40;value&#41;
&#41;;
&#125;

%0 is a0 register of cpu.

byeeee

hlide · Post by **hlide** » Sat Nov 18, 2006 12:40 am

Raphael wrote:
hlide wrote:but I dunno which of three is faster.
I'd guess his first solution:
Code: Select all
void asd&#40;float value&#41;&#123;
__asm__&#40;
"mtv %0,S000 \n"
&#58;
&#58;"r"&#40;value&#41;
&#41;;
&#125;
which is the straightforward way to load a register into a vector register. I'd guess that would only take a few cycles (1?), since it's basically register->register move. The others all suffer from memory access.

i'm not sure to understand here. Take this example :

Code: Select all

float f&#40;float a, float b&#41;
&#123;
  return a + b;
&#125;

when compiled we get :

Code: Select all

f&#58;
  jr $ra
  add.s $f0, $f12, $f13

so there is no way you can have a 1-cyle move register to register here

you need to use either a memory or a general purpose register to move a FPU register to VFPU scalar register and vice versa.

So if you want to get a float after a VFPU scalar computational, you need to do at least :

Code: Select all

mfv $v0, S000
jr $ra
mtc1 $v0, $f0

or using memory.

Raphael · Post by **Raphael** » Sat Nov 18, 2006 1:14 am

hlide wrote: i'm not sure to understand here.

I was just refering to the single instruction that copys the register content, ie only the "mtv %0, s000\n" vs. your versions with two ops each which also use memory access (slow unless in cache) to load the float into the s000 register. So the single mtv should always be faster.
But you're right, it prolly won't be one cycle still.

hlide · Post by **hlide** » Sat Nov 18, 2006 2:36 am

Raphael wrote:
hlide wrote: i'm not sure to understand here.
I was just refering to the single instruction that copys the register content, ie only the "mtv %0, s000\n" vs. your versions with two ops each which also use memory access (slow unless in cache) to load the float into the s000 register. So the single mtv should always be faster.
But you're right, it prolly won't be one cycle still.

still, when I see this, i'm bothered :

Code: Select all

void asd&#40;float value&#41;&#123;
__asm__&#40;
"mtv %0,S000 \n"
&#58;
&#58;"r"&#40;value&#41;
&#41;;
&#125;

by agreeing with that, you seemed to imply that "float value" is in fact a GPR register ($a0) as stated by su-47k after this code ("%0 is a0 register of cpu".) and not in a FPU register ($f12). This is what bothered me.

If i compile that, I get this :

Code: Select all

asd&#58;
  mfc1	$v0,$f12
  mtv	$v0, S000.s
  jr	ra
  nop

So that couldn't be a 1-cycle move between a fpu register and vfpu register even with that version.

So what i proposed gives the same code except that i dont' HIDE the fact you still need two instructions at least to move a FPU register to vfpu register :

Code: Select all

void asd&#40;float value&#41;
&#123;
   register int x;
   __asm__&#40;
      "mfc1 %0, %1\n"
      "mtv %0, S000\n"
      &#58;
      &#58;"r"&#40;x&#41;, "f"&#40;value&#41;
   &#41;;
&#125;

but yeah i know this a very futile detail but that can help to understand why there are things you can do and other things you can't do because gcc sometimes inserts some hidden instructions.

hlide · Post by **hlide** » Sat Nov 18, 2006 2:42 am

basically, what I'm saying is that FPU<->VFPU "conversion" are not cheaper than integer conversions, so you must think twice before using it : can I avoid to mix them at all until I need to store them in memory ? this is my conclusion.

Raphael · Post by **Raphael** » Sat Nov 18, 2006 3:20 am

Ah, ok, now I got your point :) Didn't even think about the fact that 'value' is in a fpu register. So you're right. Still memory reads would be slower I suppose.

forums.ps2dev.org

VFPU Load

VFPU Load

Re: VFPU Load

Re: VFPU Load