[coreboot] Fwd: s2892 + CBFS strange failure

ron minnich rminnich at gmail.com
Fri Apr 24 20:48:47 CEST 2009


Please excuse this blast. Here's the problem: CBFS is breaking something it
can't break. If you turn on CBFS, then very early startup in the opteron
code fails. this is verified across several mainboards. Any wild ideas
welcome. I can't even figure out where to start ...

ron


Forwarded conversation
Subject: s2892 + CBFS strange failure
------------------------

From: *Myles Watson* <mylesgw at gmail.com>
Date: Wed, Apr 22, 2009 at 9:05 AM
To: ron minnich <rminnich at gmail.com>, Stefan Reinauer <stepan at coresystems.de>,
Patrick Georgi <patrick.georgi at coresystems.de>, Marc Jones <
marcj303 at gmail.com>


On Wed, Apr 22, 2009 at 9:56 AM, ron minnich <rminnich at gmail.com> wrote:
> can I bring in patrick and stephan and marcj? This is getting too weird.

:)

Probably part of it is miscommunication on my part, but I'd be glad
for any help.

Here's the summary:

With CONFIG_CBFS = 0 it works fine
With CONFIG_CBFS = 1 I get (warm reset):

   INIT detected from  --- {  APICID = 00 NODEID = 00 COREID = 00} ---

   Issuing SOFT_RESET...

Then nothing else.  Post code 0xf0

With CONFIG_CBFS = 1 I get (cold reset):

Nothing.  Post code 0xf0

I've been inserting post codes, and it always makes it to real_main.
It just doesn't make it out of init_cpus.  On a warm reset I get the
serial output.  Otherwise there is none.

We've tried using a different compiler.  Same results.
We've tried no payload and no VGA ROM.

Thanks,
Myles

----------
From: *ron minnich* <rminnich at gmail.com>
Date: Wed, Apr 22, 2009 at 9:12 AM
To: Myles Watson <mylesgw at gmail.com>
Cc: Stefan Reinauer <stepan at coresystems.de>, Patrick Georgi <
patrick.georgi at coresystems.de>, Marc Jones <marcj303 at gmail.com>


Also, myles, this all works on serengeti, right?

ron

----------
From: *Myles Watson* <mylesgw at gmail.com>
Date: Wed, Apr 22, 2009 at 9:13 AM
To: ron minnich <rminnich at gmail.com>
Cc: Stefan Reinauer <stepan at coresystems.de>, Patrick Georgi <
patrick.georgi at coresystems.de>, Marc Jones <marcj303 at gmail.com>


I was in the middle of writing that :)

I forgot an interesting point:

The broken image works on SimNOW until it can't find the SMBUS.  But
it always gets far enough that there is some serial output.

Thanks,
Myles

----------
From: *Marc Jones* <marcj303 at gmail.com>
Date: Wed, Apr 22, 2009 at 10:28 AM
To: Myles Watson <mylesgw at gmail.com>
Cc: ron minnich <rminnich at gmail.com>, Stefan Reinauer <stepan at coresystems.de>,
Patrick Georgi <patrick.georgi at coresystems.de>


That is very strange. can you attempt to track when it starts to fail?
Does it have to boot all the way into linux before the reset stops
working or does it happened before it loads any payloads?

I can't think of anything that would cause that kind of problem. Can
you narrow it down in cpu_init?

Marc

--
http://marcjonesconsulting.com

----------
From: *Myles Watson* <mylesgw at gmail.com>
Date: Wed, Apr 22, 2009 at 10:48 AM
To: Marc Jones <marcj303 at gmail.com>
Cc: ron minnich <rminnich at gmail.com>, Stefan Reinauer <stepan at coresystems.de>,
Patrick Georgi <patrick.georgi at coresystems.de>


Sorry I was unclear again.  I'll try to explain better.

When I said it happens on warm reset, I meant from a working image.

1. boot a working image
2. switch to cbfs image
3. warm reset gives some output
It seems like it hangs on the first call to printk that it reaches.  I
tried moving console_init ahead of init_cpus in real_main, but it
didn't change the behavior.

Thanks,
Myles

----------
From: *ron minnich* <rminnich at gmail.com>
Date: Wed, Apr 22, 2009 at 1:44 PM
To: Myles Watson <mylesgw at gmail.com>
Cc: Marc Jones <marcj303 at gmail.com>, Stefan Reinauer <stepan at coresystems.de>,
Patrick Georgi <patrick.georgi at coresystems.de>


so cbfs works with
qemu
kontron (yes or no?I think yes)
serengeit

and it fails with this board.

are these older CPUs? What stepping?

I have to admit I'm stumped.

ron

----------
From: *Myles Watson* <mylesgw at gmail.com>
Date: Wed, Apr 22, 2009 at 1:47 PM
To: ron minnich <rminnich at gmail.com>
Cc: Marc Jones <marcj303 at gmail.com>, Stefan Reinauer <stepan at coresystems.de>,
Patrick Georgi <patrick.georgi at coresystems.de>


I may have been chasing the wrong thing here.  When I was helping
Samuel with the the dl145 he said that somewhere after 4030 cold boot
broke for him.  He's bisecting now.

Thanks,
Myles

----------
From: *Myles Watson* <mylesgw at gmail.com>
Date: Thu, Apr 23, 2009 at 6:15 AM
To: ron minnich <rminnich at gmail.com>
Cc: Marc Jones <marcj303 at gmail.com>, Stefan Reinauer <stepan at coresystems.de>,
Patrick Georgi <patrick.georgi at coresystems.de>


Just to add a wrinkle my onboard graphics died.  That's why things
were flaky yesterday.  It just stopped responding to config reads and
gets disabled by coreboot.

I added a video card and I'm back up.  Cold boot works for me with
4193 (No CBFS), so the Config changes were fine.  It's still broken
for CBFS for me.  Unless someone has an idea of how to track it down
I'm just going to not use CBFS for now, even though I like the CBFS
option much better.

Thanks,
Myles

----------
From: *ron minnich* <rminnich at gmail.com>
Date: Thu, Apr 23, 2009 at 7:57 AM
To: Myles Watson <mylesgw at gmail.com>
Cc: Marc Jones <marcj303 at gmail.com>, Stefan Reinauer <stepan at coresystems.de>,
Patrick Georgi <patrick.georgi at coresystems.de>


we really need to track this down because whatever this may be, it's
unlikely to be cbfs. Not if you're not getting any prints at all.

It would still be interesting if you could try the very first version
where cbfs was introduced.

ron

----------
From: *Myles Watson* <mylesgw at gmail.com>
Date: Thu, Apr 23, 2009 at 12:04 PM
To: ron minnich <rminnich at gmail.com>
Cc: Marc Jones <marcj303 at gmail.com>, Stefan Reinauer <stepan at coresystems.de>,
Patrick Georgi <patrick.georgi at coresystems.de>


There were some fixes put in pretty quickly.  I just tried 4113 (the
rename.)  Which one would you suggest next?

Thanks,
Myles

----------
From: *Myles Watson* <mylesgw at gmail.com>
Date: Thu, Apr 23, 2009 at 3:09 PM
To: ron minnich <rminnich at gmail.com>


4061 fails with CBFS but not without.

Thanks,
Myles

----------
From: *ron minnich* <rminnich at gmail.com>
Date: Fri, Apr 24, 2009 at 7:23 AM
To: Myles Watson <mylesgw at gmail.com>


no serial output and SPEW?

ron

----------
From: *Myles Watson* <mylesgw at gmail.com>
Date: Fri, Apr 24, 2009 at 7:38 AM
To: ron minnich <rminnich at gmail.com>


For a warm boot.  Nothing from a cold boot.  4061 no CBFS works fine.

Thanks,
Myles

----------
From: *Myles Watson* <mylesgw at gmail.com>
Date: Fri, Apr 24, 2009 at 7:40 AM
To: ron minnich <rminnich at gmail.com>


SPEW is definitely enabled for the working one.

Thanks,
Myles

----------
From: *Myles Watson* <mylesgw at gmail.com>
Date: Fri, Apr 24, 2009 at 8:18 AM
To: ron minnich <rminnich at gmail.com>
Cc: Marc Jones <marcj303 at gmail.com>, Stefan Reinauer <stepan at coresystems.de>,
Patrick Georgi <patrick.georgi at coresystems.de>


4061 on my s2892 with SPEW:
On my s2895 I am having problems with warm reset, and a cold boot
powers itself off quickly with post code 0xf0.

Thanks,
Myles

----------
From: *ron minnich* <rminnich at gmail.com>
Date: Fri, Apr 24, 2009 at 9:47 AM
To: Myles Watson <mylesgw at gmail.com>
Cc: Marc Jones <marcj303 at gmail.com>, Stefan Reinauer <stepan at coresystems.de>,
Patrick Georgi <patrick.georgi at coresystems.de>, Ward Vandewege <ward at gnu.org
>


OK, this is nuts. CBFS is in the ram code. It can't affect he ROM
code, can it? And this is really early!
Here are the only things I can think of:
1. CBFS changes layout somehow
2. Turning off ELFBOOT turned off something hidden
3. it's changing the way gcc works

I just don't know. Somehow we've got to find this. I will set up my
dbm board tonight.

Patrick, Stefan, have you tested CBFS with the kontron?

ron

----------
From: *Patrick Georgi* <patrick.georgi at coresystems.de>
Date: Fri, Apr 24, 2009 at 9:49 AM
To: ron minnich <rminnich at gmail.com>
Cc: Myles Watson <mylesgw at gmail.com>, Marc Jones <marcj303 at gmail.com>,
Stefan Reinauer <stepan at coresystems.de>, Ward Vandewege <ward at gnu.org>


Am 24.04.2009 18:47, schrieb ron minnich:
That's where my lzma.c patch came from. I'm debugging the bounce buffer code
right now,
it seems to copy correctly into the buffer, but I'm not convinced yet that
it correctly copies back.

Patrick

----------
From: *ron minnich* <rminnich at gmail.com>
Date: Fri, Apr 24, 2009 at 9:55 AM
To: Patrick Georgi <patrick.georgi at coresystems.de>
Cc: Myles Watson <mylesgw at gmail.com>, Marc Jones <marcj303 at gmail.com>,
Stefan Reinauer <stepan at coresystems.de>, Ward Vandewege <ward at gnu.org>


but Myle's failure is WAY before any of that. His machine dies in the
very earliest C code.

I do not really like the bounce buffer ... it' just too fragile for my
taste. If anything goes wrong, well, you're in assembly code with no
way out.

ron

----------
From: *Patrick Georgi* <patrick.georgi at coresystems.de>
Date: Fri, Apr 24, 2009 at 10:35 AM
To: ron minnich <rminnich at gmail.com>
Cc: Myles Watson <mylesgw at gmail.com>, Marc Jones <marcj303 at gmail.com>,
Stefan Reinauer <stepan at coresystems.de>, Ward Vandewege <ward at gnu.org>


OK, this is nuts. CBFS is in the ram code. It can't affect he ROM

I'm not quite sure at which point in the boot process the last message
before the reboot comes up, so this is just a guess.

Could it be that it tries to jump into the normal image? I'm not quite
certain that we get that entirely correct (and the layout might change in
that dark corner of the build system).
Replacing that "jmp __normal_image" with "jmp __fallback_image" might help
then (for testing).


Patrick

----------
From: *ron minnich* <rminnich at gmail.com>
Date: Fri, Apr 24, 2009 at 10:42 AM
To: Patrick Georgi <patrick.georgi at coresystems.de>
Cc: Myles Watson <mylesgw at gmail.com>, Marc Jones <marcj303 at gmail.com>,
Stefan Reinauer <stepan at coresystems.de>, Ward Vandewege <ward at gnu.org>


On Fri, Apr 24, 2009 at 10:35 AM, Patrick Georgi
not *that* is a pretty smart guess. Myles, were you runinng fallback/normal?

ron

----------
From: *Myles Watson* <mylesgw at gmail.com>
Date: Fri, Apr 24, 2009 at 10:42 AM
To: ron minnich <rminnich at gmail.com>
Cc: Patrick Georgi <patrick.georgi at coresystems.de>, Marc Jones <
marcj303 at gmail.com>, Stefan Reinauer <stepan at coresystems.de>, Ward Vandewege
<ward at gnu.org>


fallback only.  I'll try it.

Thanks,
Myles

----------
From: *ron minnich* <rminnich at gmail.com>
Date: Fri, Apr 24, 2009 at 10:57 AM
To: Myles Watson <mylesgw at gmail.com>
Cc: Patrick Georgi <patrick.georgi at coresystems.de>, Marc Jones <
marcj303 at gmail.com>, Stefan Reinauer <stepan at coresystems.de>, Ward Vandewege
<ward at gnu.org>


src/lib/cbfs.c
src/include/cbfs.h
src/devices/pci_rom.c
src/boot/selfboot.c
src/boot/hardwaremain.c

But none of these are involved in the early CAR code.

There is another possibility: are we somehow messing up the HT
configuration space? That would explain why you die after init_cpus.

ron

----------
From: *ron minnich* <rminnich at gmail.com>
Date: Fri, Apr 24, 2009 at 11:11 AM
To: Myles Watson <mylesgw at gmail.com>
Cc: Patrick Georgi <patrick.georgi at coresystems.de>, Marc Jones <
marcj303 at gmail.com>, Stefan Reinauer <stepan at coresystems.de>, Ward Vandewege
<ward at gnu.org>


well,this problem just became urgent. I've got no idea where to start
and no time right now to work on it :-(

And it doesn't break on simnow, right, myles? Patrick, any progress on
kontron?

oh, !@#$@!#$@!#$!@$#

ron

----------
From: *Myles Watson* <mylesgw at gmail.com>
Date: Fri, Apr 24, 2009 at 11:12 AM
To: ron minnich <rminnich at gmail.com>
Cc: Patrick Georgi <patrick.georgi at coresystems.de>, Marc Jones <
marcj303 at gmail.com>, Stefan Reinauer <stepan at coresystems.de>, Ward Vandewege
<ward at gnu.org>


And why there's only output on a warm reset.

I don't know.  I tried removing the normal image jump in
cache_as_ram_auto.c.  I guess I should have remembered that it got
past there before init_cpus.

Myles

----------
From: *Myles Watson* <mylesgw at gmail.com>
Date: Fri, Apr 24, 2009 at 11:13 AM
To: ron minnich <rminnich at gmail.com>
Cc: Patrick Georgi <patrick.georgi at coresystems.de>, Marc Jones <
marcj303 at gmail.com>, Stefan Reinauer <stepan at coresystems.de>, Ward Vandewege
<ward at gnu.org>


Right.  The serengeti image works fine, and the s2892 image runs until
it notices it has a different chipset and dies with "SMBUS not found."

Myles

----------
From: *ron minnich* <rminnich at gmail.com>
Date: Fri, Apr 24, 2009 at 11:15 AM
To: Myles Watson <mylesgw at gmail.com>
Cc: Patrick Georgi <patrick.georgi at coresystems.de>, Marc Jones <
marcj303 at gmail.com>, Stefan Reinauer <stepan at coresystems.de>, Ward Vandewege
<ward at gnu.org>


Anyone mind if I just take this to the list.

ron

----------
From: *Ward Vandewege* <ward at gnu.org>
Date: Fri, Apr 24, 2009 at 11:16 AM
To: ron minnich <rminnich at gmail.com>
Cc: Myles Watson <mylesgw at gmail.com>, Patrick Georgi <
patrick.georgi at coresystems.de>, Marc Jones <marcj303 at gmail.com>, Stefan
Reinauer <stepan at coresystems.de>


Please do.

Thanks,
Ward.

--
Ward Vandewege <ward at fsf.org>
Free Software Foundation - Senior Systems Administrator

----------
From: *Myles Watson* <mylesgw at gmail.com>
Date: Fri, Apr 24, 2009 at 11:21 AM
To: ron minnich <rminnich at gmail.com>


No problem here.  We probably should have done it a while ago.  I just
didn't want to make too big of a stink if we could fix it quickly.

Thanks,
Myles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.coreboot.org/pipermail/coreboot/attachments/20090424/c56f9719/attachment.html>


More information about the coreboot mailing list