inline assembler bug (vcallms)?

Saotome · Post by **Saotome** » Wed May 04, 2005 2:53 am

there seems to be a bug in gcc inline assembler (i'm using the "New PS2Dev Installer for Win32").

when using vcallms, the 15bit immediate value is not shifted correctly and the opcode gets changed. so its not vcallms anymore if the address isn't 0.
i was using an older win32 toolchain before (IIRC it was the one from lkz's homepage), which compiled vcallms correctly, so i dont unsderstand why the new one, which is "including the latest patches available at time of release", has this bug.

anyone can confirm that for the actual toolchain and fix that?

Post by **pixel** » Wed May 04, 2005 4:35 am

Well, that's exactly why people should test packages when we are asking for tests :P If you discovered that bug, then it's here since, like, 1 year now.

Anyway: I don't have the old toolchain anymore, and I am not really used to VU programming. Can you show me some code snipplet, how they should be assembled, and eventually, how they are actually assembled, even though I can do that myself with the actual toolchain.

Saotome · Post by **Saotome** » Wed May 04, 2005 5:24 am

ok, here's some code.
the second vcallms is not compiled correctly - the lower 6bit of the instruction word should always be 111000b for vcallms, and the 15bit immediate value (address) should be shifted 6 bits left before OR'ing it. in the old version it was also divided by 8 (so the 96 (byteoffest) in the example was compiled to 0x0c).
now the 96 is not shifted left and is OR'ed to the opcode:
0111000 |
1100000
-> 1111000
in this example the opcode is not changed but in some cases it can be a completly different instruction - after compiling and disassembling it the address value is 0x01 instead of 0x0c.

Code: Select all

	asm __volatile__&#40;"
		...
		.align 3
_StartVerletLoop0&#58;
		lqc2		vf01,0x0&#40;%0&#41;
		lqc2		vf02,0x10&#40;%0&#41;
		lqc2		vf03,0x20&#40;%0&#41;
		...

		vcallms		0

		lqc2		vf11,0x80&#40;%0&#41;
		lqc2		vf12,0x90&#40;%0&#41;
		lqc2		vf13,0xa0&#40;%0&#41;
		...

		vcallms		96		#// vcallms interlocks with first mpg call?
							#//96 &#40;byteadress&#41; = line 12 &#40;64bit adr&#41;
		...

		sqc2		vf01,0x0&#40;%0&#41; #//store results from 1st mpg
		sqc2		vf02,0x10&#40;%0&#41;
		sqc2		vf03,0x20&#40;%0&#41;
		...

		qmfc2.i		$8,vf11 #// wait until 2nd mgp finished
		sq			$8,0x80&#40;%0&#41;
		sqc2		vf12,0x90&#40;%0&#41;
		sqc2		vf13,0xa0&#40;%0&#41;
		...

		addi		%3,%3,-1
		addiu		%0,%0,0x100		
		bne			%3,$0,_StartVerletLoop0
_EndVerl0&#58;

the VU0 MPG:

Code: Select all

.global testVu0Code
.global testVu0CodeEnd
.global testVu0Data
.global testVu0DataEnd

.vu
testVu0Code&#58;

	add.xyz		vf02,vf02,vf21		nop
	add.xyz		vf04,vf04,vf21		nop
	add.xyz		vf06,vf06,vf21		nop
	add.xyz		vf08,vf08,vf21		nop

	mul.xyz		vf02,vf02,vf22		nop
	mul.xyz		vf04,vf04,vf22		nop
	mul.xyz		vf06,vf06,vf22		nop
	mul.xyz		vf08,vf08,vf22		nop

	add.xyz		vf01,vf01,vf02		nop
	add.xyz		vf03,vf03,vf04		nop
	add&#91;E&#93;.xyz	vf05,vf05,vf06		nop
	add.xyz		vf07,vf07,vf08		nop

	add.xyz		vf12,vf12,vf21		nop
	add.xyz		vf14,vf14,vf21		nop
	add.xyz		vf16,vf16,vf21		nop
	add.xyz		vf18,vf18,vf21		nop

	mul.xyz		vf12,vf12,vf22		nop
	mul.xyz		vf14,vf14,vf22		nop
	mul.xyz		vf16,vf16,vf22		nop
	mul.xyz		vf18,vf18,vf22		nop

	add.xyz		vf11,vf11,vf12		nop
	add.xyz		vf13,vf13,vf14		nop
	add&#91;E&#93;.xyz	vf15,vf15,vf16		nop
	add.xyz		vf17,vf17,vf18		nop

testVu0CodeEnd&#58;
...

Post by **pixel** » Wed May 04, 2005 5:31 am

"whoops"

I think I see what's wrong. Allow me some time to dig it though. Since this was some quite old stuff, I can't remember exactly everything well.

--- edit ---

Mhhh, actually, I can't see what's wrong straight. The code that actually handles that is the following:

Code: Select all

#define OP_SH_VUDEST            21
#define OP_MASK_VUDEST          0xf
#define OP_SH_VUCALLMS          6
#define OP_MASK_VUCALLMS        0x7fff&#91;...&#93;
      case 'g'&#58; USE_BITS &#40;OP_MASK_VUCALLMS,     OP_SH_VUCALLMS&#41;;break;
      case '&'&#58; USE_BITS &#40;OP_MASK_VUDEST,       OP_SH_VUDEST&#41;;  break;

The first line is for destination of the vucallms instruction (and specifically for it) Since you don't seem to use relocation, the relocation code I was thinking of isn't in cause here (or shouldn't....)

(fyi, the VUDEST and '&' corresponds to stuff like vmadd)

Post by **pixel** » Wed May 04, 2005 9:44 am

Okay, so. Seems to be fixed. You can try the latest toolchain from cvs, or fetch my daily compiled binaries "as usual" here: http://nnoble.nerim.net/ps2dev (when it'll be uploaded, that is, in an hour or so at the time of writing)

Thanks for reporting that bug ;)