Unoptimize memcpy

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
mICrO
Posts: 25
Joined: Mon Oct 17, 2005 2:48 am
Location: Madrid (Spain)

Unoptimize memcpy

Post by mICrO »

I figured that memcpy implementation on the SDK isn't working very fast. Seems to work like coping BYTE per BYTE. So, I get low frame rate when I copy a huge amount of data.

These could be for many reasons, what the best approach its could be implements something similar in MIPS assembly, using something like x86’s MOVSD. The problems it’s that I know x86 and z80 assembler, what none of MIPS, either how to do it inline in gcc compilers, and those assembler templates.

I check RIM source code, and I found that the author used a custom memcpy function coded in inline MIPS assembler. This function seems to copy WORD per WORD; so, it's doubling speed

I try to use it, but doesn't work, as I would like.

The source code comments are in Japanese, so I couldn't read it, as well.

I know how to code inline x86 assembler in Microsoft visual c++, and I code a function that does what I want in inline x86 assembler, using MOVSD for copying DWORD blocks, and reusing some register to copy rest of data with MOVSB.

Any one could help me?

Maybe it’s time to learn some MIPS assembly and inline assembler in gcc.

Here it's my x86 code, that I would like to replicate in PSP:

Code: Select all

////////////////////////////////////////////////////////////
//ASM IMPLEMENTATION

//asm x86 implementation using DWORD copys
inline void memcpya(void* ori, void* dest, unsigned long len)
{
    __asm
    {
        PUSH    eax             //save registers
        PUSH    ecx
        PUSH    esi
        PUSH    edi

        MOV     eax, len        // EAX = len
                

        MOV     ecx, eax        
        SHR     ecx, 2          // ECX = (EAX>>2) 
    
        PUSH    ecx             // save ECX
        
        MOV     esi, dest       // copy from origin to dest ECX dwords
        MOV     edi, ori
        REP     MOVSD

        POP     ecx             // restore ECX (must be 0 after MOVSD)

        SHL     ecx, 2      
        SUB     eax, ecx        // EAX -= &#40;ECX << 2&#41;
                
        JZ      end             // left bytes?

        MOV     ecx, eax
        REP     MOVSB           // copy from origin to dest ECX bytes

    end&#58;
        POP     edi             // restore registers
        POP     esi
        POP     ecx
        POP     eax
        
    &#125;;

&#125;

////////////////////////////////////////////////////////////
//C IMPLEMENTATION &#40;only for TEST&#41;

//dirty c implementation for copy DWORDS
inline void memcpy4&#40;DWORD* ori, DWORD* dest, unsigned long len&#41;
&#123;
    DWORD *pori=ori, *pdest=dest;
    for&#40;;len>0;len--&#41; *ori++ = *pdest++;
&#125;

//dirty c implementation for copy bytes
inline void memcpy1&#40;BYTE* ori, BYTE* dest, unsigned long len&#41;
&#123;
    BYTE *pori=ori, *pdest=dest;
    for&#40;;len>0;len--&#41; *ori++ = *pdest++;
&#125;

//dirty c implementation for copy, first DWORD, rest BYTES
inline void memcpyc&#40;void* ori, void* dest, unsigned long len&#41;
&#123;
    unsigned long totalbytes = len;
    unsigned long copydwords = totalbytes>>2;
    unsigned long copybytes = &#40;copydwords<<2&#41;;
    
    //copy DWORD blocks
    if&#40;copydwords&#41;
    &#123;
        memcpy4&#40;&#40;DWORD*&#41;ori,&#40;DWORD*&#41;dest,copydwords&#41;;
        totalbytes-=copybytes;
    &#125;

    //copy BYTES blocks
    if&#40;totalbytes&#41;
    &#123;
        memcpy1&#40;&#40;BYTE*&#41;ori+copybytes,&#40;BYTE*&#41;dest+copybytes,totalbytes&#41;;
    &#125;

&#125;


mICrO^NewOlds
ja_medina at hotmail dot com

There is no such thing as a moral or an immoral book.

Books are well written or badly written.
(Oscar Wilde)
mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

I've checked in a change (1513) to Newlib to remove the PREFER_SIZE_OVER_SPEED define that prevented the optimized versions of memcpy() and friends from being used. Note that you have to link with newlib (-lc) after rebuilding it with `toolchain.sh -n'. The optimized memcpy() will transfer data in 16-byte blocks if it can, then fallback on copying data 4 bytes at a time, then fall back on a MIPS unaligned word store failing that.

You can find the source to this memcpy() in newlib/libc/machine/mips/memcpy.c in your unpacked Newlib 1.13.0 source.
ector
Posts: 195
Joined: Thu May 12, 2005 10:22 pm

Post by ector »

Nice one mrbrown.

Oh and mICrO, your x86 code may have been optimal on the 486, but since the Pentium other methods are much faster than rep movsd, especially for block sizes over a few hundred bytes.
http://www.dtek.chalmers.se/~tronic/PSPTexTool.zip Free texture converter for PSP with source. More to come.
mICrO
Posts: 25
Joined: Mon Oct 17, 2005 2:48 am
Location: Madrid (Spain)

now its optimized

Post by mICrO »

First of all , thanks MrBrown, greets news, I'm rebuilding newlibc, leater I'll check perfomance, I checked optimize version of memcpy and looks great.

ector: The x86 implementation it's was only for show how it's could be do in PSP faster, moving 64 bits blocks, becouse early I don't know any MIPS assembly, and I need to explain what I thinks its faster.

As you know nowdays x86 cpu have faster data transfer opcodes, since MMX or SE. In fact I don't use any replacment for memcpy in x86, platform because its well optimized ;-)

I was doingin a workarround, I learned how to do MIPS inline asembly in gcc, and some MIPS assembly as well, I try to do something relate to using simple ld / sd, then some addiu and bnez to loop, to perform 64 bits copy, and the 8 bits for residual part, and seens that will work well, but when I check the optimize version of memcpy looks will do almost the same, then I stoped my asm optimization :P

So, I will peform some test, but seems that I don't need any MIPS assembly, this time , thumbs up :)

P.E.: I tested and works much faster than before, I spect that many people get increase of speed in they application, memcpy its really used every where. So, update newlibc/sdk (don't forget to get last toolchain from svn, includes newlibc patch changes) and rebuild your sources.
mICrO^NewOlds
ja_medina at hotmail dot com

There is no such thing as a moral or an immoral book.

Books are well written or badly written.
(Oscar Wilde)
Post Reply