These could be for many reasons, what the best approach its could be implements something similar in MIPS assembly, using something like x86’s MOVSD. The problems it’s that I know x86 and z80 assembler, what none of MIPS, either how to do it inline in gcc compilers, and those assembler templates.
I check RIM source code, and I found that the author used a custom memcpy function coded in inline MIPS assembler. This function seems to copy WORD per WORD; so, it's doubling speed
I try to use it, but doesn't work, as I would like.
The source code comments are in Japanese, so I couldn't read it, as well.
I know how to code inline x86 assembler in Microsoft visual c++, and I code a function that does what I want in inline x86 assembler, using MOVSD for copying DWORD blocks, and reusing some register to copy rest of data with MOVSB.
Any one could help me?
Maybe it’s time to learn some MIPS assembly and inline assembler in gcc.
Here it's my x86 code, that I would like to replicate in PSP:
Code: Select all
////////////////////////////////////////////////////////////
//ASM IMPLEMENTATION
//asm x86 implementation using DWORD copys
inline void memcpya(void* ori, void* dest, unsigned long len)
{
__asm
{
PUSH eax //save registers
PUSH ecx
PUSH esi
PUSH edi
MOV eax, len // EAX = len
MOV ecx, eax
SHR ecx, 2 // ECX = (EAX>>2)
PUSH ecx // save ECX
MOV esi, dest // copy from origin to dest ECX dwords
MOV edi, ori
REP MOVSD
POP ecx // restore ECX (must be 0 after MOVSD)
SHL ecx, 2
SUB eax, ecx // EAX -= (ECX << 2)
JZ end // left bytes?
MOV ecx, eax
REP MOVSB // copy from origin to dest ECX bytes
end:
POP edi // restore registers
POP esi
POP ecx
POP eax
};
}
////////////////////////////////////////////////////////////
//C IMPLEMENTATION (only for TEST)
//dirty c implementation for copy DWORDS
inline void memcpy4(DWORD* ori, DWORD* dest, unsigned long len)
{
DWORD *pori=ori, *pdest=dest;
for(;len>0;len--) *ori++ = *pdest++;
}
//dirty c implementation for copy bytes
inline void memcpy1(BYTE* ori, BYTE* dest, unsigned long len)
{
BYTE *pori=ori, *pdest=dest;
for(;len>0;len--) *ori++ = *pdest++;
}
//dirty c implementation for copy, first DWORD, rest BYTES
inline void memcpyc(void* ori, void* dest, unsigned long len)
{
unsigned long totalbytes = len;
unsigned long copydwords = totalbytes>>2;
unsigned long copybytes = (copydwords<<2);
//copy DWORD blocks
if(copydwords)
{
memcpy4((DWORD*)ori,(DWORD*)dest,copydwords);
totalbytes-=copybytes;
}
//copy BYTES blocks
if(totalbytes)
{
memcpy1((BYTE*)ori+copybytes,(BYTE*)dest+copybytes,totalbytes);
}
}