Romcc Ramblings...

Eric W. Biederman ebiederman at lnxi.com
Fri Dec 5 01:17:00 CET 2003


Stefan Reinauer <stepan at suse.de> writes:

> * ron minnich <rminnich at lanl.gov> [031202 17:13]:
> > On 1 Dec 2003, Eric W. Biederman wrote:
> > 
> > > Ron one thing you did note was the changing of word accesses to byte
> > > accesses.  With romcc that does not help in the case of register pressure.
> > 
> > I would think it would hurt since x86 lets you use those little 
> > sub-registers (puddle arithmetic), so using bigger registers reduces the 
> > number of registers available.
> 
> Yes, being able to use this from romcc would severely lower register
> pressure I assume. Neither romcc nor the code compiled with it takes
> care of this at the moment though.

I tried this at one point.  And the problem is that there
is not a instruction sequence to move to/from the byte registers
from a normal 32bit register.  Which negates most of the benefit of
the extra registers.  64bit mode on the Opteron gets byte register correct
but it no longer has more than one byte register per general purpose register.

Getting in support for mmx and sse registers was much more beneficial.
16 more instead of just 4.

A more general purpose technique is to use bit-fields.  I am close to having
bit-fields implemented in my backburner version of romcc.  I have some
really odd ball ideas about bitfields in 128 bit sse registers :)  But
who knows when I will get that done.

Bit-fields still share with the x86 byte registers the property of
increasing the register pressure when you modify their values or
read/write them. (Because the field needs a register of it's own to be
modified).  But when they are just passed around they can nicely
reduce the register pressure.  And in addition they are under
programmer control so you know it is a trade off between register
pressure when using the value and register pressure when passing the
values. 

You can roll bit-fields by hand at the moment if you want though.

What I find most disturbing is last I looked is that 
size crt0.o list it at about 33K (After lowering spurious debugging
messages from debug to spew).  And linuxbios_payload.nrvb at about
24K.   crt0.o from the p4dpr is at about 10K.  So romcc is giving
me a 3X code bloat...  I am pretty certain it is code bloat caused
by inlining everything.

Ron you complained earlier about compile speed and I think romcc
is the big culprit there.  It's register allocator is currently using
a O(N^2) data structure, so the more code it compiles the slower it
gets...  I think I saw another version of basically the same
algorithm that uses a different data structure, which would make it
much faster.  

Right now the speed is tolerable when I remember to set 
#define DEBUG_CONSISTENCY 1
instead of 2 which I committed accidently the other day.
DEBUG_CONSISTENCY 2 is only really useful when debugging
the register allocator.  With a perfect compiler DEBUG_CONSISTENCY
is not needed at all but romcc is still teething so if there
is not a performance hit it is useful.

Eric



More information about the coreboot mailing list