Unicode conversion

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
kirrukirru
Posts: 2
Joined: Mon Oct 06, 2008 2:23 am

Unicode conversion

Post by kirrukirru »

Hi,

I read the Game Title from PARAM.SFO, which is in UTF-8. I need to convert it into Unicode. Please note that I deal with multi-byte characters in game title (Chinese/Japanese etc.) Please help me what are the library functions available to do this conversion.
I tried mbstowcs with setlocale(LC_ALL, "UTF-8"), but doesn't work. I tried to use iconv, but I can't find the necessary lib file in my build environment (with PSPToolchain).

Thanks,
Kiran
BenHur
Posts: 28
Joined: Sat Oct 20, 2007 5:26 pm

Re: Unicode conversion

Post by BenHur »

kirrukirru wrote:I read the Game Title from PARAM.SFO, which is in UTF-8. I need to convert it into Unicode. Please note that I deal with multi-byte characters in game title (Chinese/Japanese etc.) Please help me what are the library functions available to do this conversion.
UTF-8 to unicode conversion is simple (from libccc which is used by intraFont):

Code: Select all

int cccUTF8toUCS2(cccUCS2 * dst, size_t count, cccCode const * str) {
	if (!str || *str == '\0' || !dst) return 0;

    int i = 0, length = 0;
    while &#40;str&#91;i&#93; && length < count&#41; &#123;
		if  &#40;str&#91;i&#93; <= 0x7FU&#41; &#123;       //ASCII
			dst&#91;length&#93; = &#40;cccUCS2&#41;str&#91;i&#93;; 
			i++;    length++; 
		&#125; else if &#40;str&#91;i&#93; <= 0xC1U&#41; &#123; //part of multi-byte or overlong encoding ->ignore
			i++;          
		&#125; else if &#40;str&#91;i&#93; <= 0xDFU&#41; &#123; //2-byte
			dst&#91;length&#93; = &#40;&#40;str&#91;i&#93;&0x001fu&#41;<<6&#41; | &#40;str&#91;i+1&#93;&0x003fu&#41;; 
			i += 2; length++; 
		&#125; else if &#40;str&#91;i&#93; <= 0xEFU&#41; &#123; //3-byte
			dst&#91;length&#93; = &#40;&#40;str&#91;i&#93;&0x001fu&#41;<<12&#41; | &#40;&#40;str&#91;i+1&#93;&0x003fu&#41;<<6&#41; | &#40;str&#91;i+2&#93;&0x003fu&#41;; 
			i += 3; length++; 
		&#125; else i++;                    //4-byte, restricted or invalid range ->ignore
	&#125;
    return length;
&#125;
The library also supports common codepages (Big5,...)
Cheers, BenHur
kirrukirru
Posts: 2
Joined: Mon Oct 06, 2008 2:23 am

Post by kirrukirru »

Thanks a lot BehHur!!

The code given by you is working fine!!
Post Reply