[coreboot] [RFC] Re-thinking the stages

Fri Apr 6 20:12:44 CEST 2012

Hi Kyösti,

i understand the problems you are trying to solve, but I hesitate to
add stages as it makes it that much more confusing and harder to
maintain. I have some specific comments inline.

On Wed, Apr 4, 2012 at 2:54 AM, Kyösti Mälkki <kyosti.malkki at gmail.com> wrote:
> Hi!
>
> Looking at some of the changes proposed with the new support of Intel
> Sandybridge and Ivybridge, combined with my previous design choices made
> with support of Intel Hyper-Threading for NetBurst architectures and SMP
> generally, made me share my thoughts of the Coreboot stage -layout.
>
> So there is currently bootblock, romstage, ramstage and payload, in that
> specific order. I have identified a few issues that would need to be
> worked on.
>
> 1. Built-in-self test failures
>
> On (Intel) SMP system only BSP CPU failure is detected and possibly
> reported. I think architecture allows that BSP CPU is not the same
> physical core across power-cycles. One should consistently either be
> redundant or die on single CPU failure.
>
> 2. Serial console
>
> This is initialized in romstage and requires working cache to work. If
> due to a BIST failure or bad cache-as-ram init code, cache fails to
> work, there is no console.
>
> 3. Microcode updates
>
> The "tiny" bootblock doesn't seem like the correct place for microcode
> updates.
>

Does microcode have to be this early? Before CAR?

> 4. Cache coherency
>
> MTRR setup should be consistent across all CPUs. If all CPUs are started
> for microcode updates before ramstage, they should fix their MTRRs too.
> Even then, pre-ram spinlocks may be impossible to implement, so pre-ram
> SMP operation is very, very restricted.
>

Yes, this has been a problem in the past, but seems unrelated to the
stages. It is ust an implementation issue.

> 5. XIP alignment
>
> If 4 variable MTRRs were used in pre-ram execution environment for XIP,
> there would be no alignment requirement on the placement of XIP romstage
> in Flash ROM. Such runtime MTRR setup code is around 512 bytes and cache
> footprint would extend at most 30% over the actual romstage size.
> A single MTRR setup may reserve almost twice the actual size of a
> romstage in both flash and cache memory.
>

Is this really a problem? Also, The XIP area is limited to avoid using
too much cache on some processors. Some code may be executed out of
flash, but not be in the XIP area to avoid CAR corruption.

> 6. Bypassing raminit
>
> One may want to start his Coreboot conversion job from something less
> complex than raminit, like setting up PCI device tree. With the amount
> of cache on modern CPUs, one could probably run libpayload -apps from
> cache. One such a nice app would be zmodem download of raminit.
>

This is an interesting thought, but really a debug/development
feature. Seems that calling raminit vs zmodem function in romstage or
the bootblock loading a different romstage. I don't see how this
changes the stages.

> 7. CPU max physical address
>
> MTRR physical mask should be set correctly for the time of romstage too,
> just in case memory over 4GB is tested. Should first auto-detect and
> then provide work-arounds for CPU errata.

Tthere are CPU specific AMD mtrr setup functions to support of the
hoisted memory and cache properties of that range. It would be good if
this could be combined in a nice way, but I would nack a file full of
workarounds for different CPUs types.

>
>
>
> Counting all the above together, I would like to start some discussion
> whether the current 4 stage model is the best design choice. I am
> thinking about some changes in the layout as a fix:
>
> 1. Bootblock
>
> No real change. Must guarantee access to all of Flash ROM and
> operational PCI configuration cycles for following stages.
> Contains boot vector for any AP CPUs.
> Exits in protected mode to Stageloader.
>
> 2. Stageloader
>
> A new stage. This has a pre-CAR environment (ROMCC-build) to enable
> early serial console and control MTRR setup to enable cache-as-ram.
> Pre-CAR environment can execute stages from Flash ROM with XIP.
>
I am not convinced of the value of earlier serial console. It adds
complexity where things should be simple. I hesitate to continue
moving more code before CAR. We have had a long standing goal to
reduce code before CAR.

> This also has a CAR and RAM environment (GCC-build) that can execute XIP
> stages from Flash ROM or decompress stages to CAR/RAM from Flash ROM.
>

At least with AMD processors, you can't execute code out of the data
cache. The only way to get instructions cached is with a code fetch,
which doesn't hit the data cache.

I see the reason for executing something other than the normal boot
process, but should it be a completely new stage? Why not have the
bootblock load a different romstage if you want different romstage
behavior?

> 3. CPU init
>
> A new stage built with ROMCC. Checks BIST of AP CPUs, executes microcode
> updates and handles the issue of shared Cache-Disable bit on
> hyper-threading Intel CPUs.

Why is there a new stage built with ROMCC after CAR? Again, we want to
reduce the code built with ROMCC. It is very fragile, doesn't optimize
and link with gcc and we don't get symbols for symbolic debug.

>
> 4. RAM init
>
> Old romstage built with GCC. Returns to Stageloader after DRAM is
> functional, but before any DRAM is written.

Why make a new stage? Why return to the previous stage? This also
becomes a XIP CAR nightmare.

>
> 5. DEV init
>
> Old ramstage built with GCC. Only change is that microcode update and
> SMP setup is already taken care of.

For some CPUs it makes sense to do microcode and SMP setup here
(although much less lately). This is still a place to SMP setup, as
MTRRs and other settings need to be configured for normal operation.

>
> 6. Payload
>
> No changes required.
>
>
> I would be interested in working on some of these topics and I think I
> can also test most of the suggested changes on older SMP hardware.
>

IMO, more stages is more confusing and I don't see the benefit. I
think that reworking what is going on in some CPU/mainboards
bootblock/romstage/ramstage is worth fixing, but I think that the
adding more stages, cbfs searching and loading , etc is the wrong
direction for coreboot. We should be focused on simple, lean, and
concise. Something else to consider is how vendorcode is starting to
change coreboot. I would like to see how the new Intel code is added.
There is a lot of assumed configuration ownership in the vendor code,
and coreboot may need to adjust to that. Something else to consider as
you design these stages.

Marc

> Thanks,
>
> Kyösti Mälkki
> <kyosti.malkki at gmail.com>
>
>
>
> --
> coreboot mailing list: coreboot at coreboot.org
> http://www.coreboot.org/mailman/listinfo/coreboot

-- 
http://se-eng.com