When I try to compile this code:
#define GEN___Store16IntsToMatrixDestr(_mpTypeFunc_,_mpTag_,_mpNrMatrix_) \
\
\
_mpTypeFunc_ void _mpTag_##Store16IntsToMatrixDestr_m##_mpNrMatrix_ (float *Data) \
{ \
__asm__ volatile ( \
"vf2in.q C"#_mpNrMatrix_"00, C"#_mpNrMatrix_"00, 0\n" /* Esegue la conversione del contenuto del registro in */ \
"vf2in.q C"#_mpNrMatrix_"10, C"#_mpNrMatrix_"10, 0\n" /* integer: questo distrugge il contenuto della matrice */ \
"vf2in.q C"#_mpNrMatrix_"20, C"#_mpNrMatrix_"20, 0\n" /* VFPU. */ \
"vf2in.q C"#_mpNrMatrix_"30, C"#_mpNrMatrix_"30, 0\n" \
\
"sv.s S"#_mpNrMatrix_"00, 0 + %0\n" /* Provvedi a salvare i dati */ \
"sv.s S"#_mpNrMatrix_"01, 4 + %0\n" \
"sv.s S"#_mpNrMatrix_"02, 8 + %0\n" \
"sv.s S"#_mpNrMatrix_"03, 12 + %0\n" \
\
"sv.s S"#_mpNrMatrix_"10, 16 + %0\n" \
"sv.s S"#_mpNrMatrix_"11, 20 + %0\n" \
"sv.s S"#_mpNrMatrix_"12, 24 + %0\n" \
"sv.s S"#_mpNrMatrix_"13, 28 + %0\n" \
\
"sv.s S"#_mpNrMatrix_"20, 32 + %0\n" \
"sv.s S"#_mpNrMatrix_"21, 36 + %0\n" \
"sv.s S"#_mpNrMatrix_"22, 40 + %0\n" \
"sv.s S"#_mpNrMatrix_"23, 44 + %0\n" \
\
"sv.s S"#_mpNrMatrix_"30, 48 + %0\n" \
"sv.s S"#_mpNrMatrix_"31, 52 + %0\n" \
"sv.s S"#_mpNrMatrix_"32, 56 + %0\n" \
"sv.s S"#_mpNrMatrix_"33, 60 + %0\n" \
\
: : "m"(*Data)); \
\
return; \
}
// End macro
MACROGEN1d(, Store16IntsToMatrixDestr, ndHEL_XFPU_)
The code compiles correctly, but vf2in creates an error of the PSP CPU.
What is wrong ?
VFPU Hang
Huh... i've no internet access longer at home so it would be difficult for me to test it :///
1) Be sure you thread accepts VFPU and it is not called from a callback which can have its own thread. It looks like you have an exception when running VFPU instruction. Just add a simple VADD.S S000, S000, S000 at the begining of your code to check if you have an exception at ADD.
2) why do you need to do 4x4 "sv.s" instead of 4x2 "svl/r.q" ? in the end you will have only 8 stores instead of 16 stores, and even only 4 stores ("sv.q") if your data is 16-byte aligned.
1) Be sure you thread accepts VFPU and it is not called from a callback which can have its own thread. It looks like you have an exception when running VFPU instruction. Just add a simple VADD.S S000, S000, S000 at the begining of your code to check if you have an exception at ADD.
2) why do you need to do 4x4 "sv.s" instead of 4x2 "svl/r.q" ? in the end you will have only 8 stores instead of 16 stores, and even only 4 stores ("sv.q") if your data is 16-byte aligned.