there seems to be a bug in gcc inline assembler (i'm using the "New PS2Dev Installer for Win32").
when using vcallms, the 15bit immediate value is not shifted correctly and the opcode gets changed. so its not vcallms anymore if the address isn't 0.
i was using an older win32 toolchain before (IIRC it was the one from lkz's homepage), which compiled vcallms correctly, so i dont unsderstand why the new one, which is "including the latest patches available at time of release", has this bug.
anyone can confirm that for the actual toolchain and fix that?
inline assembler bug (vcallms)?
Well, that's exactly why people should test packages when we are asking for tests :P If you discovered that bug, then it's here since, like, 1 year now.
Anyway: I don't have the old toolchain anymore, and I am not really used to VU programming. Can you show me some code snipplet, how they should be assembled, and eventually, how they are actually assembled, even though I can do that myself with the actual toolchain.
Anyway: I don't have the old toolchain anymore, and I am not really used to VU programming. Can you show me some code snipplet, how they should be assembled, and eventually, how they are actually assembled, even though I can do that myself with the actual toolchain.
pixel: A mischievous magical spirit associated with screen displays. The computer industry has frequently borrowed from mythology. Witness the sprites in computer graphics, the demons in artificial intelligence and the trolls in the marketing department.
ok, here's some code.
the second vcallms is not compiled correctly - the lower 6bit of the instruction word should always be 111000b for vcallms, and the 15bit immediate value (address) should be shifted 6 bits left before OR'ing it. in the old version it was also divided by 8 (so the 96 (byteoffest) in the example was compiled to 0x0c).
now the 96 is not shifted left and is OR'ed to the opcode:
0111000 |
1100000
-> 1111000
in this example the opcode is not changed but in some cases it can be a completly different instruction - after compiling and disassembling it the address value is 0x01 instead of 0x0c.
the VU0 MPG:
the second vcallms is not compiled correctly - the lower 6bit of the instruction word should always be 111000b for vcallms, and the 15bit immediate value (address) should be shifted 6 bits left before OR'ing it. in the old version it was also divided by 8 (so the 96 (byteoffest) in the example was compiled to 0x0c).
now the 96 is not shifted left and is OR'ed to the opcode:
0111000 |
1100000
-> 1111000
in this example the opcode is not changed but in some cases it can be a completly different instruction - after compiling and disassembling it the address value is 0x01 instead of 0x0c.
Code: Select all
asm __volatile__("
...
.align 3
_StartVerletLoop0:
lqc2 vf01,0x0(%0)
lqc2 vf02,0x10(%0)
lqc2 vf03,0x20(%0)
...
vcallms 0
lqc2 vf11,0x80(%0)
lqc2 vf12,0x90(%0)
lqc2 vf13,0xa0(%0)
...
vcallms 96 #// vcallms interlocks with first mpg call?
#//96 (byteadress) = line 12 (64bit adr)
...
sqc2 vf01,0x0(%0) #//store results from 1st mpg
sqc2 vf02,0x10(%0)
sqc2 vf03,0x20(%0)
...
qmfc2.i $8,vf11 #// wait until 2nd mgp finished
sq $8,0x80(%0)
sqc2 vf12,0x90(%0)
sqc2 vf13,0xa0(%0)
...
addi %3,%3,-1
addiu %0,%0,0x100
bne %3,$0,_StartVerletLoop0
_EndVerl0:
Code: Select all
.global testVu0Code
.global testVu0CodeEnd
.global testVu0Data
.global testVu0DataEnd
.vu
testVu0Code:
add.xyz vf02,vf02,vf21 nop
add.xyz vf04,vf04,vf21 nop
add.xyz vf06,vf06,vf21 nop
add.xyz vf08,vf08,vf21 nop
mul.xyz vf02,vf02,vf22 nop
mul.xyz vf04,vf04,vf22 nop
mul.xyz vf06,vf06,vf22 nop
mul.xyz vf08,vf08,vf22 nop
add.xyz vf01,vf01,vf02 nop
add.xyz vf03,vf03,vf04 nop
add[E].xyz vf05,vf05,vf06 nop
add.xyz vf07,vf07,vf08 nop
add.xyz vf12,vf12,vf21 nop
add.xyz vf14,vf14,vf21 nop
add.xyz vf16,vf16,vf21 nop
add.xyz vf18,vf18,vf21 nop
mul.xyz vf12,vf12,vf22 nop
mul.xyz vf14,vf14,vf22 nop
mul.xyz vf16,vf16,vf22 nop
mul.xyz vf18,vf18,vf22 nop
add.xyz vf11,vf11,vf12 nop
add.xyz vf13,vf13,vf14 nop
add[E].xyz vf15,vf15,vf16 nop
add.xyz vf17,vf17,vf18 nop
testVu0CodeEnd:
...
infj
"whoops"
I think I see what's wrong. Allow me some time to dig it though. Since this was some quite old stuff, I can't remember exactly everything well.
--- edit ---
Mhhh, actually, I can't see what's wrong straight. The code that actually handles that is the following:
The first line is for destination of the vucallms instruction (and specifically for it) Since you don't seem to use relocation, the relocation code I was thinking of isn't in cause here (or shouldn't....)
(fyi, the VUDEST and '&' corresponds to stuff like vmadd)
I think I see what's wrong. Allow me some time to dig it though. Since this was some quite old stuff, I can't remember exactly everything well.
--- edit ---
Mhhh, actually, I can't see what's wrong straight. The code that actually handles that is the following:
Code: Select all
#define OP_SH_VUDEST 21
#define OP_MASK_VUDEST 0xf
#define OP_SH_VUCALLMS 6
#define OP_MASK_VUCALLMS 0x7fff[...]
case 'g': USE_BITS (OP_MASK_VUCALLMS, OP_SH_VUCALLMS);break;
case '&': USE_BITS (OP_MASK_VUDEST, OP_SH_VUDEST); break;
(fyi, the VUDEST and '&' corresponds to stuff like vmadd)
pixel: A mischievous magical spirit associated with screen displays. The computer industry has frequently borrowed from mythology. Witness the sprites in computer graphics, the demons in artificial intelligence and the trolls in the marketing department.
Okay, so. Seems to be fixed. You can try the latest toolchain from cvs, or fetch my daily compiled binaries "as usual" here: http://nnoble.nerim.net/ps2dev (when it'll be uploaded, that is, in an hour or so at the time of writing)
Thanks for reporting that bug ;)
Thanks for reporting that bug ;)
pixel: A mischievous magical spirit associated with screen displays. The computer industry has frequently borrowed from mythology. Witness the sprites in computer graphics, the demons in artificial intelligence and the trolls in the marketing department.