Using Cache As Ram for K8
Eric W. Biederman
ebiederman at lnxi.com
Thu Jun 24 16:59:00 CEST 2004
Li-Ta Lo <ollie at lanl.gov> writes:
> I have successfully used the cache in the K8 processor as RAM on
> the AMD Serenade mainboard. The cache as ram is used as a tiny
> stack space for the code generated by GCC which replace the need
> for a register only C complier like ROMCC. Now the whole LinuxBIOS
> C code can be compiled by GCC.
Note this certainly will not work for older cpus. But there is less
complexity there so hopefully romcc is sufficient.
> There are few problems remaining. The first thing is I can only
> use 7 cache lines of cache (448 bytes) reliably in the K8. The
> access to the 8th cache line is unstable and the access to the
> 9th cache line hangs the processor. The other problem is the
> optimize_connection() function for multi-processor configuration
> runs unstably under CAR. It does not overflow the stack, it's just
> plain unstable for some reason. So I can only configure the mainboard
> as Uniprocessor.
Most likely it is the cross cpu probes, causing cache invalidates.
You may be able to ``improperly'' setup caching of memory (no cross
cpu probes) while you are initializing the memory controllers.
I wonder if some part of that cache line access problems are
the swapping between L1 and L2. Although that sounds unlikely.
> Is there anyone has any idea about these problems ? If we can solve
> these two problems, Cache As Ram can be used routinly for K8 and
> probably we can try to extend it to some other processors.
Ollie while in theory the cache as RAM idea works. When I have implemented
it has been a case of fixing it with every cpu rev. Whereas romcc while it
is harder, only needs to be stabilized once. And you don't need to load
a microcode update just so your code can run.
Before we do this routinely I would really like some buy off from AMD
that they would support this. But anyway...
On the fun side it would be extremely interesting is if you could get
enough memory working to start paging and we could go into 64bit mode :)
That is likely tempting fate too much.....
> What is the "effective" or "equalvalent" stack size of ROMCC ?
> Is 448 bytes of stack adquant for ROMCC "linted" code in general ?
8 (gpr) + 8 (mmx) + 8 (sse) registers each 4 bytes long = 96 bytes.
Looking at the hdama configuration my max inline depth is 14
procedures so that likely totals to another 14 *4 = 56 bytes in
return addresses. So 448 bytes would be a small improvement.
Note generally I have noticed romcc compiled does not even use all of
More information about the coreboot