IDEA: Linux kernel and pcbios compatibility...

Eric W. Biederman ebiederman at lnxi.com
Sat Dec 20 03:01:01 CET 2003


Joshua Wise <joshua at joshuawise.com> writes:

> On Friday 19 December 2003 7:15 pm, Eric W. Biederman wrote:
> > Yes we are reaching the point where we can converge on some of these
> > things. LAB might be the right framework.  And if it is something good it
> > will save me the trouble of starting my own project.   But it takes more
> > than a hyper active 2 year old to convince me.  It might take a hyperactive
> > 2 year old to remind me about interesting ideas though.
> Right, well then you should see it in action. If you're in the Boston area 
> sometime soon I can give you a demo on an iPAQ, perhaps.

Salt Lake City, and Illinois with my family for Christmas.  Though a serial
console logfile might be interesting.
 
> > > 512k is with a few ARM-specific drivers, and jffs2. It does not have
> > > networking. This is with kernel 2.6.
> > Hmm.  I am pretty certain I have gotten 2.6 down some smaller.  Our
> > practical limit with LinuxBIOS etc is in the neighborhood of 384KB.
> I've done 2.4 in 256k, but it's rather useless like that. If you do not plan 
> to load modules at runtime, you can shave a good bit more off of it. If you 
> write bzip2 compression support (or upport the stuff from kernel 2.4), you 
> can shave even more off of it. I've pulled off 50k with bzip2 (not actually 
> written the code, just did a bzip2 -9 < piggy > piggy.bz2). 

The problem is that the bzip2 decompresser is huge, usually bzip2 is
a net loss because of the decompresser.  But it may be possible to
write a tuned version.  The cases I have typically worried about are
much smaller and I have made huge gains by switching to nrv2b from upx
because the decompresser is something like 100 bytes, and the
compression is roughly as good as gzip.

> If you don't plan to have a framebuffer, you can shave some off of
> it. If you don't plan to have jffs2 you can shave a lot more
> off. Little tidbits here and there make the world go 'round.

Quite true.

> > Well I think I have run finally convinced to use the MTD drivers...
> > Mostly I prefer to flash from a production kernel rather than a
> > bootloader, there are more recover options but anyway.

> Ah yes, the ancient problem. Instead of read/modify/erase/write, it often 
> turns into read/modify/erase/poweroff. That's Bad.
:)

> > I will see.  Does LAB restrict it's kernel to a very small subset of
> > memory?  Or do you use something like kexec?
> To boot a secondary kernel I use some code I wrote called armboot, although 
> it's not very arm specific. It does something like this:
> 
> 1) Load the new kernel into a contiguous vmalloced block.
> 2) We allocate 64k for a list of things that need to be relocated. We call 
> this a pointer of type "struct physlist", which is 32 bytes. It has four 
> ints: the new address, the old address, the block size, and whether this is 
> the last block.
> 2) In blocks of the maximum kmalloc size (these blocks have to be contiguous), 
> we kmalloc space for the kernel, and memcpy the kernel into those blocks. We 
> then fill in a struct physlist, and move on to the next struct physlist. We 
> can do this because kmalloc is always contiguous, and we can always map it 
> with virt_to_phys().
> 3) We set up another kmalloced block for the tagged list of boot parameters 
> that you need on ARM.
> 4) We set up one more kmalloced block and copy an assembler function into it, 
> to make sure we don't wipe ourself out while relocating.
> 5) We flush our data caches.
> 6) We call the relocated assembler function, which turns off the MMU, jumps 
> into the relocated assembler function's physical address, and does actual 
> relocating. Then we jump into our newly moved zImage. Confused yet?

Nope.  Having implemented something similar it sounds sane.

> 7) If at any point we failed, the system could be in an inconsistent state. 
> You will want to panic() if you fail, because you're leaking memory like a 
> sieve, and if you failed there's probably something bigger wrong.

Ouch.

> This looks more difficult than it actually is. The C segment is only about 170 
> lines, and the assembler bit is 90 lines. 
> 
> The reason that this works is that kmalloc should allocate from the top of 
> memory down. You need a fair bit of ram - say, 8MB - to prevent the tail from 
> running over other important structures, such as the list of addresses to 
> relocate. But it seems to work well enough, and it looks like it should be 
> fairly portable. The important code is in handhelds.org cvs, module 
> linux/kernel26, files drivers/bootldr/armboot.c and 
> drivers/bootldr/armboot-asm.S.

Ok.   I need to get into that kernel tree and take a look.  But it sounds similar
to my kexec stuff.  Which I discuss at least part of the time on fastboot at osdl.org.
It sounds compatible enough that we could productively merge implementations,
that plus my kexec stuff is still on Andrew Morton todo list
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/must-fix/should-fix-7.txt
means it has a fair shot of getting into the stock kernel.  

On a practical side I think I can boost it's priority high enough after I
get back to actually do something.

A recent version of kexec patch is at:
http://developer.osdl.org/rddunlap/kexec/

Kexec as it is currently structured is actually two system calls
callable from user space.

sys_kexec_load() load the kernel into a linked list of pages, making
certain that when those pages are copied to their final destination
nothing will be stomped.  And it allocates a chunk of memory with kmalloc
for the bit of code that copies the kernel to it's final resting
place.    This can fail at any time and the system is in a consistent state.

sys_reboot(LINUX_REBOOT_CMD_KEXEC) initiates the transfer to the new
kernel.

The new kernel is started in physical mode.  

sys_kexec_load() is passed an entry point to jump to, and an array
of physical destination address, virtual process space address, and virtual
length regions to load.  Which allows us to load arbitrary things.

The only requirement is that you have enough memory for both kernels
simultaneously.  For truly high end machines there are some other
restrictions because physical mode does not allow access to all of
their memory but anyway...



More information about the coreboot mailing list