[coreboot] MP table multicore patch

Myles Watson mylesgw at gmail.com
Wed Feb 17 00:58:59 CET 2010


On Tue, Feb 16, 2010 at 4:40 PM, Stefan Reinauer <stepan at coresystems.de>wrote:

>  On 2/16/10 8:42 PM, Myles Watson wrote:
>
>
>
> On Tue, Feb 16, 2010 at 12:02 PM, Stefan Reinauer <stepan at coresystems.de>wrote:
>
>>  On 2/16/10 5:11 AM, Timothy Pearson wrote:
>> > Here is a cleaned up and tested version of the SMP APIC autodetect
>> patch.
>> >
>> > Signed-off-by: Timothy Pearson <tpearson at raptorengineeringinc.com>
>>
>
>
>> And, can you describe high level, what the patch changes? It looks to me
>> as if you are recursing through the tree instead of just walking the
>> "all_devices" list.
>> So this implies that you don't catch all devices when running through
>> all_devices. This sounds like a problem in the resource allocator and
>> maybe it should be fixed there instead?
>>
> I don't understand why this would be a resource allocator problem.  Aren't
> we talking about the device tree?
>
>
> Oh, sorry I didn't say more...  What the old code did was the following:
>
>     device_t cpu;
>     for(cpu = all_devices; cpu; cpu = cpu->next) {
>      ....
>      <- check if the device is an APIC and if it is, dump an APIC entry in
> the mptable ->
>      ....
>     }
>
> But it seems that the code does not see all devices -- anymore -- not sure
> if it ever did, but I think to remember that's what happened at some point,
> or was at least intended.
>
I'm really surprised that it didn't see all devices.


> So now the code looks l
>
> scan_for_apic_devices(parent)
> {
>     for(c_it=0; c_it < parent->links; c_it++) {
>         for (child = parent->link[c_it].children; child; child =
> child->sibling) {
> ....
>
>             if (child->path.type == DEVICE_PATH_APIC_CLUSTER) {
>                 // Found an APIC cluster, scan it for APICs
>                 smp_scan_for_apic_devices(child);
>             }
>         }
>     }
> }
>
> So two more steps are necessary:
> - check all the downwards links of a device instead of just walking devices
> and checking their type.
> - run recursively in a special case on APIC clusters.
>
> This sounds a whole lot like something changed in the way "all_devices"
> works. And if "all_devices" does not mean "all devices" I am sure there are
> more places in our code that need similar fixes.
>
I agree.  I've never seen the problem where I couldn't run through all the
devs with all_devices.

>
>  Maybe the real problem is related to the memory corruption seen with
> printk?
>
> Not completely impossible, but I figured it's easier to ask the guy who
> rewrote the resource allocator if he knows something about how the intended
> behavior of "all_devices" ;-)
>
As far as I know I didn't change that behavior.  Does anyone have a boot log
where they can see that the tree of devices has more devices than the list
(at the same stage of enumeration)?


> - Maybe the behavior is intended, then we just need to check in Timothy's
> patch.
> - Maybe the behavior is not intended, but that's how the code works right
> now. Then we'd rather have to look at the resource allocator and decide if
> we want "all_devices" to mean "all devices", or whether we rename the
> variable, or redefine its meaning.
>
I think it should mean all devices.  I changed resource allocation to go
through the tree because children of disabled devices shouldn't have
allocated resources.

- Maybe none of the above applies, then we need to do nasty stack corruption
> debugging. In this case it would be fundamentally wrong to touch either the
> device tree code or the mp table creation code until we fix the corruption
> in order to make sure we don't create funny special cases.
>
This is my vote, but I'm happy to be proven wrong.  I don't see anywhere in
the code where devices get added to the tree but not to the list of
devices.  I also don't remember a place where devices get removed from the
all_devices list.


> So, if you have hints enlightening any of the maybes, please share!
>
My suggestion would be to traverse the tree and the list and compare them.
The problem is that printk was causing a hang for him.

Thanks,
Myles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.coreboot.org/pipermail/coreboot/attachments/20100216/4b2af2d6/attachment.html>


More information about the coreboot mailing list