Alignment fix/workaround for gcc 3.2.2

Discuss the development of software, tools, libraries and anything else that helps make ps2dev happen.

Moderators: cheriff, Herben

Post Reply
pixel
Posts: 791
Joined: Fri Jan 30, 2004 11:43 pm

Alignment fix/workaround for gcc 3.2.2

Post by pixel »

Okay, well. I've finally come up with several patches for gcc. One is a bug I spotted: the BIGGEST_ALIGNMENT constant was wrong. But it doesn't seem to be fixing everything. Well, I've more or less been able to track down that nasty alignment bug, and I think I am closer of it every day.

For the time beeing though, I won't be able to work a lot on that, since I have some other duties, known as Real Life stuff. So, well, for now, I've built up a quick and dirty work around for it. You now have a -malign-all=X option on the gcc's command line. Which mean that, when used, every variable will be at least aligned to 2^X bytes. So, if you put -malign-all=4, every variable will be forcabily aligned to 16 bytes (that is, 128 bits), which is a workaround for that built-in "memcpy" bug, and maybe others. This is defaulted to 1 by default, so it is harmless to have it on a "stable" toolchain, since it won't interfere with the normal code generation.

Here is the whole patch I've provided to ooPo, which should be included to the stable gcc, I think:

Code: Select all

diff -ur gcc-3.2.2/gcc/config/mips/mips.c gcc-3.2.2-patched/gcc/config/mips/mips.c
--- gcc-3.2.2/gcc/config/mips/mips.c    Thu Nov  4 00:50:53 2004
+++ gcc-3.2.2-patched/gcc/config/mips/mips.c    Thu Nov  4 00:16:50 2004
@@ -281,6 +281,8 @@
 #endif
 const char * mips_no_align128_string;
 const char * mips_align128_string;
+const char * mips_align_all_string;
+int mips_align_all = 1;

 /* Whether we are generating mips16 hard float code.  In mips16 mode
    we always set TARGET_SOFT_FLOAT; this variable is nonzero if
@@ -4968,6 +4970,14 @@
   if (TARGET_SINGLE_FLOAT && TARGET_SOFT_FLOAT)
     target_flags &= ~((TARGET_DEFAULT) & (MASK_SOFT_FLOAT | MASK_SINGLE_FLOAT));
 #endif
+
+  if (mips_align_all_string == 0)
+    mips_align_all = 1;
+
+  else if (ISDIGIT (*mips_align_all_string))
+    {
+      mips_align_all = atoi (mips_align_all_string);
+    }

   /* Get the architectural level.  */
   if (mips_isa_string == 0)
diff -ur gcc-3.2.2/gcc/config/mips/mips.h gcc-3.2.2-patched/gcc/config/mips/mips.h
--- gcc-3.2.2/gcc/config/mips/mips.h    Thu Nov  4 00:50:53 2004
+++ gcc-3.2.2-patched/gcc/config/mips/mips.h    Thu Nov  4 00:09:45 2004
@@ -158,6 +158,7 @@
 extern const char *mips_no_mips16_string;/* for -mno-mips16 */
 extern const char *mips_explicit_type_size_string;/* for -mexplicit-type-size */
 extern const char *mips_cache_flush_func;/* for -mflush-func= and -mno-flush-func */
+extern const char *mips_align_all_string;/* for -malign-all= */
 extern const char * mips_align128_string;/* for -malign128 */
 extern const char * mips_no_align128_string;/* for -mno-align128 */
 extern int mips_split_addresses;       /* perform high/lo_sum support */
@@ -180,6 +181,8 @@
 extern void            sdata_section PARAMS ((void));
 extern void            sbss_section PARAMS ((void));

+extern int mips_align_all;
+
 /* Stubs for half-pic support if not OSF/1 reference platform.  */

 #ifndef HALF_PIC_P
@@ -637,6 +640,8 @@
       N_("Don't call any cache flush functions")},                     \
   { "flush-func=", &mips_cache_flush_func,                             \
       N_("Specify cache flush function")},                             \
+  { "align-all=",  &mips_align_all_string,                              \
+      N_("Force lower alignment")},                                     \
 }

 /* This is meant to be redefined in the host dependent files.  */
@@ -1693,7 +1698,7 @@
 #define STRUCTURE_SIZE_BOUNDARY 8

 /* There is no point aligning anything to a rounder boundary than this.  */
-#define BIGGEST_ALIGNMENT 64
+#define BIGGEST_ALIGNMENT 128

 /* Set this nonzero if move instructions will actually fail to work
    when given unaligned data.  */
@@ -4657,7 +4662,7 @@
    to a multiple of 2**LOG bytes.  */

 #define ASM_OUTPUT_ALIGN(STREAM,LOG)                                   \
-  fprintf (STREAM, "\t.align\t%d\n", (LOG))
+  fprintf (STREAM, "\t.align\t%d\n", (LOG) > mips_align_all ? (LOG) : mips_align_all)

 /* This is how to output an assembler line to advance the location
    counter by SIZE bytes.  */
pixel: A mischievous magical spirit associated with screen displays. The computer industry has frequently borrowed from mythology. Witness the sprites in computer graphics, the demons in artificial intelligence and the trolls in the marketing department.
mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

BIGGEST_ALIGNMENT was already set to 128 in r5900.h. You should localize all of your changes to r5900.h instead of mips.h, to keep the patch smaller and more concise.
"He was warned..."
mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

And to ensure that IOP code doesn't get broken by EE-specific changes.
"He was warned..."
ooPo
Site Admin
Posts: 2023
Joined: Sat Jan 17, 2004 9:56 am
Location: Canada
Contact:

Post by ooPo »

I've merged this and created a new patch:

( http://www.oopo.net/consoledev/files/gc ... 1103.patch )

Enjoy!
pixel
Posts: 791
Joined: Fri Jan 30, 2004 11:43 pm

Post by pixel »

mrbrown wrote:BIGGEST_ALIGNMENT was already set to 128 in r5900.h. You should localize all of your changes to r5900.h instead of mips.h, to keep the patch smaller and more concise.
Hum, strange: changing this value really did change the output. I mean, the .s files generated by gcc -S were not the sames; some alignments were really changed. Nasty compilation side effects ? I'll try to work that out... nasty, nasty bug :)

Ho, and, there are the old -malign128 option in both mips.h and r5900.h...
pixel: A mischievous magical spirit associated with screen displays. The computer industry has frequently borrowed from mythology. Witness the sprites in computer graphics, the demons in artificial intelligence and the trolls in the marketing department.
mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

ooPo wrote:I've merged this and created a new patch:

( http://www.oopo.net/consoledev/files/gc ... 1103.patch )

Enjoy!
Could you please unmerge the patch, or add the -malign-all flag into r5900.h instead?
pixel wrote:Hum, strange: changing this value really did change the output. I mean, the .s files generated by gcc -S were not the sames; some alignments were really changed. Nasty compilation side effects ? I'll try to work that out... nasty, nasty bug :)

Ho, and, there are the old -malign128 option in both mips.h and r5900.h..
r5900.h includes mips.h so it overrides BIGGEST_ALIGNMENT. Are you using --target=ee? Also, what does -malign-all add over -malign128?
"He was warned..."
ooPo
Site Admin
Posts: 2023
Joined: Sat Jan 17, 2004 9:56 am
Location: Canada
Contact:

Post by ooPo »

The old patch is still there:

( http://www.oopo.net/consoledev/files/gc ... 0526.patch )

Enjoy!
pixel
Posts: 791
Joined: Fri Jan 30, 2004 11:43 pm

Post by pixel »

Okay. I've read a bit more deeply gcc's source code. Here are some facts:

-) It really seems there's some kind of conflict or at least, overrides between mips.h and r5900.h, and the BIGGEST_ALIGNMENT #define. At least, I'll do some other tests later on, but, as much as I can tell right now:
-) there is not enough #defines done here (seems other #defines are involved into the mess)
-) The "BIGGEST_ALIGNMENT" is only a hint, saying:

Code: Select all

/* There is no point aligning anything to a rounder boundary than this.  */
. So, if this variable is to 128, this will mean that gcc will sometime do some bloaty 128-bits alignments, without breaking anything (that is, won't interfere with structures or anything). Only separated variables.
-) malign128 isn't useful at all: it influences on a variable called mips_alignment which was introduced by the gcc patches, which is used only inside the r5900.h's "stack alignment" code. So, it doesn't work on the global variables generated by gcc when doing for example a strcpy(foo, "bar");

My "malign-all" hack is only a quick and dirty fixup which is NOT interfering with anything when not used: it only interferes with the ".align" code generation. I were NOT able to spot where it was generating ".align 3" instead of ".align 4", so, this option is here to fix a "minima" value for all the ".align" generated by gcc.

For a reminder, if you do a strcpy(foo, "somestring"); gcc will emit the following code:

Code: Select all

        .rdata
        .align  3
$LC0:
        .ascii  "something\000"
        .text
        .align  2
[...]
        lq  ...
        sq  ...
The ".align 3" here is plain wrong; that is, will align to 64 bits instead of the needed 128 bits for these "lq/sq" there. The fix up -malign-all=4 will forcabily produce the following code:

Code: Select all

        .rdata
        .align  4
$LC0:
        .ascii  "something\000"
        .text
        .align  4
[...]
        lq  ...
        sq  ...
which will most probably produce some extra zeros there, but, who cares ? Some little bloat, for a temporary (and optionnal, remember: -malign-all=1 is the default, and won't do any harm then) fix up.
pixel: A mischievous magical spirit associated with screen displays. The computer industry has frequently borrowed from mythology. Witness the sprites in computer graphics, the demons in artificial intelligence and the trolls in the marketing department.
mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

Thanks, pixel.
"He was warned..."
pixel
Posts: 791
Joined: Fri Jan 30, 2004 11:43 pm

Post by pixel »

You welcome.


Well, I've did further investigations around that. So, well, as I said, there is no real code change as it. I've recompiled the whole ps2sdk, and a few things around: everything's fine. Now, putting -malign-all=4 sometime helps, but it seems that using it on a large scale isn't a good idea. I've recompiled the whole ps2sdk and ps2link using -O3 and -malign-all=4, and it completely failed running, with a very strange flickering exception screen. So, well, my advice is really: use -malign-all=4 ONLY if you can't find any other workaround, and use it only on the .c file which is causing problem. I am still hoping to find some better fix in the future, but as I said, I will be quite busy right now with other stuff.
pixel: A mischievous magical spirit associated with screen displays. The computer industry has frequently borrowed from mythology. Witness the sprites in computer graphics, the demons in artificial intelligence and the trolls in the marketing department.
J.F.
Posts: 2906
Joined: Sun Feb 22, 2004 11:41 am

Post by J.F. »

You should try it again with -O2 instead. -O3 is known to be "problematic" for most large projects. -O3 is recommended for small snippets of code that needs to be highly optimized. Even when it does work, it's often slower than -O2. On the AMD64, compiling with -O3 results in slower code than -O2 in all known cases. There are groups looking into why this is the case, but in the meantime, they recommend not using -O3 for anything at all. I would almost guess it's the same for 64-bit MIPS architecture.
MrHTFord
Posts: 35
Joined: Tue Feb 10, 2004 2:04 am
Location: England

Post by MrHTFord »

I don't think that information relating to an x86 processor gcc port should be assumed true for a MIPS port. It's like comparing apples and oranges.

I know that the tree stuff and RTL stuff is the same for all processors, but the rest is completely different.
mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

J.F. wrote:You should try it again with -O2 instead. -O3 is known to be "problematic" for most large projects. -O3 is recommended for small snippets of code that needs to be highly optimized. Even when it does work, it's often slower than -O2. On the AMD64, compiling with -O3 results in slower code than -O2 in all known cases. There are groups looking into why this is the case, but in the meantime, they recommend not using -O3 for anything at all. I would almost guess it's the same for 64-bit MIPS architecture.
In GCC 3.2.x -O3 includes all of -O2 plus -finline-functions and -freg-moves (the latter of which does nothing for the PS2). There is no great mystery, if you turn on -finline-functions, the compiler will agressively inline any function it deems worthy.

All that extra code bloat is why it's slower.

I think I've said this in some other forum here, but here it is again: Don't use optimization flags if you don't know what they do.
"He was warned..."
pixel
Posts: 791
Joined: Fri Jan 30, 2004 11:43 pm

Post by pixel »

My gooooood...........

I wonder how I couldn't see that one........




Okay, bug found and fixed. Once again, this is due to LavosSpawn/Cody56 (the same who found an irx bug before).
LavosSpawn wrote:23:41 <LavosSpawn> now.... why didn't anyone notice and fix it before? ;)
23:42 <|Pixel|> that is indeed a very interesting question
23:42 <LavosSpawn> the string bug was known before and the comment in mips.h basically said:
23:42 <LavosSpawn> "MEEEP MEEEP! IMPORTANT! THIS IS THE ALIGNMENT FOR INLINE STRING COPY! MEEEP!"
23:42 <|Pixel|> blah
23:42 <LavosSpawn> honestly ;)
23:42 <|Pixel|> blaaaah
23:43 <LavosSpawn> ^_^
The code in question is, inside mips.h:

Code: Select all

#define CONSTANT_ALIGNMENT&#40;EXP, ALIGN&#41;                                  \
  &#40;&#40;TREE_CODE &#40;EXP&#41; == STRING_CST  || TREE_CODE &#40;EXP&#41; == CONSTRUCTOR&#41;   \
   && &#40;ALIGN&#41; < BITS_PER_WORD ? BITS_PER_WORD &#58; &#40;ALIGN&#41;&#41;

Code: Select all

#define DATA_ALIGNMENT&#40;TYPE, ALIGN&#41;                                     \
  &#40;&#40;&#40;&#40;ALIGN&#41; < BITS_PER_WORD&#41;                                           \
    && &#40;TREE_CODE &#40;TYPE&#41; == ARRAY_TYPE                                  \
        || TREE_CODE &#40;TYPE&#41; == UNION_TYPE                               \
        || TREE_CODE &#40;TYPE&#41; == RECORD_TYPE&#41;&#41; ? BITS_PER_WORD &#58; &#40;ALIGN&#41;&#41;

So, the fix is to add, at the end of r5900.h:

Code: Select all

#undef CONSTANT_ALIGNMENT
#define CONSTANT_ALIGNMENT&#40;EXP, ALIGN&#41;                                  \
  &#40;&#40;TREE_CODE &#40;EXP&#41; == STRING_CST  || TREE_CODE &#40;EXP&#41; == CONSTRUCTOR&#41;   \
   && &#40;ALIGN&#41; < 128 ? 128 &#58; &#40;ALIGN&#41;&#41;

Code: Select all

#undef DATA_ALIGNMENT
#define DATA_ALIGNMENT&#40;TYPE, ALIGN&#41;                                     \
  &#40;&#40;&#40;&#40;ALIGN&#41; < 128&#41;                                                     \
    && &#40;TREE_CODE &#40;TYPE&#41; == ARRAY_TYPE                                  \
        || TREE_CODE &#40;TYPE&#41; == UNION_TYPE                               \
        || TREE_CODE &#40;TYPE&#41; == RECORD_TYPE&#41;&#41; ? 128 &#58; &#40;ALIGN&#41;&#41;
Now it generates proper .align 4 code everywhere we have a move_by_pieces and other stuff.... so easy huh ?


Well, I am updating cvs in a minute, merging also mharris's dvp's patch, plus the -D_EE builtin I did some time ago. ooPo's ps2toolchain's cvs is updated too, and I am sending all that to ooPo right now so his site will be updated.


Again, big thanks to LavosSpawn, one scummvm's developper.
pixel: A mischievous magical spirit associated with screen displays. The computer industry has frequently borrowed from mythology. Witness the sprites in computer graphics, the demons in artificial intelligence and the trolls in the marketing department.
Post Reply