Page 1 of 1

Problem compiling ps2ftpd with latest IOP compiler

Posted: Sun Jun 19, 2005 9:09 am
by dlanor
As part of the LaunchELF project we have experimented with adding a built-in ps2ftpd.irx. This can be activated by programming a launch key to use the pseudo file MISC/PS2NET, and then using that key as a command. This causes the launching of various networking modules (if not already launched), including ps2ftpd. This all works fine with some versions of the ps2ftpd irx, but there is a problem in compiling that module properly.

If we compile it using v2.8.1 of the IOP compiler, then it works fine.
If we compile it using v3.2.2 of the IOP compiler, then transfers to memory card will malfunction, crashing somewhere in an MCMAN exception (debugging precisely where/how it happens is hard).

The file Rules.make of the ps2ftpd project indicates that an attempt has been made to adapt it to the new compiler, but this has not been fully successful (as evident from the above).

However, if I edit Rules.make so as to remove the option "-O2" from IOP_CFLAGS, then the irx compiled with v3.2.2 does work correctly. This indicates that the cause of the problem is some of the new optimization defaults in the new version of the IOP compiler.

But turning off all optimization to get around this is too high a price to pay, when it causes (in this case) a binary to grow from 25KB to 41KB. That's an increase of 64% which I find unacceptable.

So here's my real question:
Is there any way to find out exactly which optimization causes this problem ?
And if so, is there some way to disable that optimization only, while still keeping the "-O2" option ?

Best regards: dlanor

Posted: Sun Jun 19, 2005 9:46 am
by pixel
Optimisation differences between 2.8.1 and 3.2.2 are quite heavy, and are subject to cause troubles with some category of bugs, in particular with the volatile keyword. If you have several threads (and I guess it's the case), that are accessing the same variable, that variable should be declared as volatile. That's about the only big problem I can see with code-working-with-2.8.1-and-not-with-3.2.2-anymore.

Posted: Sun Jun 19, 2005 10:17 am
by dlanor
pixel wrote:Optimisation differences between 2.8.1 and 3.2.2 are quite heavy, and are subject to cause troubles with some category of bugs, in particular with the volatile keyword. If you have several threads (and I guess it's the case), that are accessing the same variable, that variable should be declared as volatile. That's about the only big problem I can see with code-working-with-2.8.1-and-not-with-3.2.2-anymore.
With a single client accessing ps2ftpd there should only be one active thread for the server (maybe 2 if it spawns when invoked, but the 2nd one won't be active, just listening), and I fail to see why this should cause a crash when trying to transfer stuff to memory card, though it causes no problem when transferring stuff to HDD partitions...

So no, I don't think it's so simple as a misdeclared variable... :(

But if that should be the problem, is there no way to get back the 'old' default behaviour by specifying some compilation flag, like we can do to avoid some other optimizations ??? (eg: "-fno-builtin" etc.)

Also, remember that if I remove "-O2" then it does work fine, though at cost of 64% larger binary. Surely there must be some way to block only the relevant optimization. Is there any list of individually blockable optimizations, so I can try them one by one, to find the culprit ?

Best regards: dlanor

Posted: Sun Jun 19, 2005 9:19 pm
by pixel
First, try -O1. Then, read a bit the ee-gcc manual, and look for the list of the individual optimisation flags, which is a very large bunch of -f flags. I'd still recommand trying finding the error though.

Posted: Mon Jun 20, 2005 3:45 am
by dlanor
pixel wrote:First, try -O1.
Ok, I will, but even if it works that is no solution, as it doesn't reveal the reason for the problem. I will do it only in the hope of later finding a list of which otimizations differ between -O1 and -O2, as that will help me eliminate some possible causes.
Then, read a bit the ee-gcc manual, and look for the list of the individual optimisation flags, which is a very large bunch of -f flags.
I realize it will be a huge job, but this sort of thing is worth the effort of testing them one by one. After all, it is not just this one project that is affected. People are complaining all over of odd behaviour of various PS2SDK modules when compiled with v3.2.2.

I'm sure most (all?) of those problems can be fixed by properly adapting the CFLAGS of those project files, so as to force the new compiler to behave more like the old one in some crucial aspects. But first we need to know exactly what flags are needed.
I'd still recommand trying finding the error though.
I'll try that too, but I strongly doubt that there is any distinct error. At least not one I can identify without having a full list of the optimization differences between v3.2.2 and v2.8.1, since one or more of those differences is what triggers the bug.

I'll try to dig into the GCC documents and sources to see what I can find in the way of flag lists, and then start testing my way through that multitude of flags...

Best regards: dlanor

Posted: Mon Jun 20, 2005 4:34 am
by pixel
dlanor wrote:I'm sure most (all?) of those problems can be fixed by properly adapting the CFLAGS of those project files, so as to force the new compiler to behave more like the old one in some crucial aspects. But first we need to know exactly what flags are needed.
I'd still recommand trying finding the error though.
I'll try that too, but I strongly doubt that there is any distinct error. At least not one I can identify without having a full list of the optimization differences between v3.2.2 and v2.8.1, since one or more of those differences is what triggers the bug.
Having a CFLAGS workaround to a bug in the software isn't such a good idea. Finding the exact bug shouldn't be that much difficult, but it might come from very obscure problems around the ps2ftpd code. Again, finding the exact bug in the code is way better than a workaround for that problem.

Posted: Mon Jun 20, 2005 2:07 pm
by Drakonite
Everything is being compiled with -g 0 right? I seem to recall this problem showing up at the time of the switch to gcc3.x but maybe I'm wrong

Posted: Tue Jun 21, 2005 2:36 am
by dlanor
Drakonite wrote:Everything is being compiled with -g 0 right?
If you mean "-G0", then yes. The following line is straight from Rules.make of ps2ftpd in CVS:

Code: Select all

IOP_CFLAGS  := $(CFLAGS_TARGET) -O2 -G0 -c $(IOP_INCS) $(IOP_CFLAGS)
Removing "-O2" from that line eliminates the bug, but at unacceptable cost of bloating the code by an extra 64%. That's why I insist on searching for a way to block just the optimization(s) responsible for the problem.

Edit: the line quoted above is also found in Makefile.íopglobal of the ps2sdk release package, so it's bound to be used by lots of projects.
I seem to recall this problem showing up at the time of the switch to gcc3.x but maybe I'm wrong
You're probably right, though it may be impossible to now pinpoint exactly what version of the compiler was the first to change the critical optimization. (Especially before we've identified which optimization it is...)

I still haven't had time to dig into the GCC docs and make a thorough step-by-step test of the possible variations, but that still remains my plan.

Best regards: dlanor

Posted: Tue Jun 21, 2005 4:39 am
by BraveDog
You can browse through the newer released versions of gcc and look for Optimization bugs that were fixed. Example:
http://gcc.gnu.org/gcc-3.3/changes.html

Here is a bug that has 'incorrect code for inlining of memcpy under -O2'
http://gcc.gnu.org/PR8634

EDIT
Also found one that is MIPS-specific:
http://gcc.gnu.org/PR9496

I'm not saying that is the problem, just things to look into.

Posted: Tue Jun 21, 2005 10:13 am
by dlanor
BraveDog wrote:You can browse through the newer released versions of gcc and look for Optimization bugs that were fixed. Example:
http://gcc.gnu.org/gcc-3.3/changes.html

Here is a bug that has 'incorrect code for inlining of memcpy under -O2'
http://gcc.gnu.org/PR8634

EDIT
Also found one that is MIPS-specific:
http://gcc.gnu.org/PR9496

I'm not saying that is the problem, just things to look into.
Thanks a lot BraveDog, this is just the sort of help I needed.

That memcpy stuff in particular looks promising, as the text clearly states that this bug is present in all versions from 3.2 through 3.3, and it is something we must block in any case, even if it is not the cause of the particular bug I'm investigating.

Edit:
I have confirmed that the bug occurs identically if "-O2" is replaced by "-O1" (I should have mentioned this earlier, but forgot)
I have also tested that -fno-inline has no effect whatever on the bug.

I'm now again at a loss for how to proceed, as the official GCC manual does not contain any list of which optimizations are turned on by "-O1" or "-O2". It only contains some very general statements about the kind of reasoning behind the inclusion of various types of optimization, without actually identifying any real cases. (like: "without performing any optimizations that take a great deal of compilation time" and similiar nonsense...)

I suppose I'll have to go to the source code of the compiler itself...
Those lists have to exist somewhere, and I WILL find them, no matter where I have to search.

Best regards: dlanor

Posted: Wed Jun 22, 2005 2:05 am
by MrHTFord
Hi Dlanor,

Welcome to the toolchain hacking club.

http://gcc.gnu.org/onlinedocs/gcc-3.3.1 ... %20Options

Should be very applicable to 3.2.2.

Enjoy. If you isolate the function that gets miscompiled, you can get GCC to output RTL (register transfer language) at each stage of the optimization process and then find out from that what exactly goes wrong. Then the hardcore fun of tracking down why it goes wrong begins!

Enjoy your stay.

Edit: Look for "-dletters" on this page to see how to get RTL outputs from GCC:

http://gcc.gnu.org/onlinedocs/gcc-3.3.1 ... %20Options

Posted: Wed Jun 22, 2005 5:00 am
by dlanor
MrHTFord wrote:Hi Dlanor,

Welcome to the toolchain hacking club.
Thanks. I haven't actually hacked much of it yet of course, but I hope to contribute something on these IOP issues.
http://gcc.gnu.org/onlinedocs/gcc-3.3.1 ... %20Options

Should be very applicable to 3.2.2.
Indeed. That is exactly the kind of list needed. But silly me made the mistake of downloading the same docs for v3.2.3 instead, thinking they were closer to what we use, and in that version of the docs there are no such lists. (The flags are only named there, but not grouped by how they are affected by -O1 or -O2.) So thanks a lot for pointing me to this version instead.
Enjoy. If you isolate the function that gets miscompiled, you can get GCC to output RTL (register transfer language) at each stage of the optimization process and then find out from that what exactly goes wrong. Then the hardcore fun of tracking down why it goes wrong begins!
Well the absolute 'why' may be too elusive. I'll be happy if I can just find a reliable way eliminate the bug. Anything more is a bonus.
Enjoy your stay.

Edit: Look for "-dletters" on this page to see how to get RTL outputs from GCC:

http://gcc.gnu.org/onlinedocs/gcc-3.3.1 ... %20Options
Thanks, I'll try that.

Best regards: dlanor