Coreboot v3

From coreboot
Revision as of 08:29, 3 July 2007 by Jbe (Talk | contribs)

Jump to: navigation, search

--Jbe 16:47, 30 June 2007 (CEST)

Creative Commons License
Creative Commons Attribution icon
This file is licensed under Creative Commons Attribution 2.5 License.
In short: you are free to distribute and modify the file as long as you attribute its author(s) or licensor(s).

How to add a new board support

TBD

How to add a new SuperIO device

TBD

How to add a new northbridge

TBD

How to add a new southbridge

TBD

How to add a new architecture support (CPU)

Implementation specification for the CAR code

  • What's to be provided to the rest of LinuxBIOSv3

TBD

  • Exported labels to the rest of LinuxBIOSv3

TBD

  • Example implementations
    • via cache test registers TBD
    • via MTRR registers TBD
    • what else is possible?

How LinuxBIOS starts after Reset

Whenever an x86 CPU wakes up after reset, it does it in Real Mode. This mode is limited to 1MiB address space and 64k offsets and the reset vector of the original 8086/88 was located at 0xFFFF0.

As there was no change even if we run current processors like P3, these newer CPUs also feels like they where start at 0xF0000:0xFFF0 after a reset. But they do not. The base of the code segment register is 0xFFFF0000 after reset, so the CPU generates a physical address of 0xFFFFFFF0 to the chipset. And the chipset is responsible to forward this area to the boot ROM. Its confusing: The CPU "thinks" it runs code at 0xF000:0xFFF0 but instead it uses code at 0xFFFFFFF0. The developers must be tanked up when they realised this design into silicon.

On some chipsets there is an additional pitfall: The so called A20 gate. It was introduced to support full compatibility for the 80286 CPU with their predecessor 8086. When the old CPU accesses space "behind" the 0xFFFFF (=1MiB) limit, they wrap around to 0x00000. On 80286 and newer processors accesses above 0xFFFFF natively do not wrap around. The external A20 gate forces this wrap also on newer CPUs. On older CPUs its open after reset, so the CPU cannot generate addresses with A20 set.

Gatea20.png

Next pitfall on some chipsets is, if they are able to forward two address spaces to the boot ROM: 0xFFFFF0000 and 0xF0000. If not and the opcode at the reset vector does something like "jmp 0xF0000:xxxx" this crashes the machine immediately, as this will force the baseaddress of the code segment register to 0. After this, the CPU really outputs address with 0xFxxxx to the chipset. And if the chipset cannot handle the forwarding of two address spaces, the boot ROM cannot be accessed anymore. You are lost.

Mirror.png

How to escape from these restrictions?

First of all, we must ensure not to touch the baseaddress of the code segment register. This will keep us in the 0xFFFF0000 address space. We can ensure this by using branches only instead of jumps. So the opcode at the reset vector must be nothing else than a branch command! The next step depends on the used chipset. Does it open the gate A20 after reset? If yes, we must close it prior switching to Linear Flat Mode.

Mandatory steps

  • loading a Global Descriptor Table
  • activating the pm bit in the CCR0 register
  • reloading of the code segment register (with a far jump)

This only requires a small amount of code. So we can shrink the pain of the real mode to only a very small part of our whole program.

Everything becomes easy when we are reaching the Linear Flat Mode. No more hardware pain, no more toolchain pain. Why Linear Flat Mode and not Protected Mode? Most people call this operation mode Protected Mode, when they switch the CPU to its native 32bit operation mode in this way. But it's only a Linear Flat Mode, as there is no protection at all. Its more like a 32 bit real mode. There is no address translation, no paging and no protection

The Linear Flat Mode

When all segment registers uses the same baseaddress and limits, it is called the Linear Flat Mode. Advantages:

  • no limits in the 4GiB address space
  • everything is allowed
  • no access restrictions to RAM and I/O
  • clear and easy to understand source code, no "tricks" required to access space above 1MiB

Disadvantages:

  • no protection if someone is working with a NULL or invalid pointer
  • no protection if any accessed address is invalid
  • as everything is allowed, everything could work against you
  • no protection of stack overflow

There is still one pain after entering the Linear Flat Mode: The lack of system RAM. This issue will be addressed by the Cache As RAM solution. See below.

The Toolchain Pain and how to solve it

It seems modern toolchains handle the Real Mode opcode generation in a correct manner. But they do not really support Real Mode sections across all parts of the toolchain. The flaw is the linker: You can't link Real Mode sections (means 16 bit), if they contain unresolved symbols (the direction doesn't matter).

But you can link every Real Mode section if it does not contain any unresolved symbols! So to solve the toolchain pain we only must avoid unresolved symbols in our Real Mode sections! As we control their content this is a way to go.

To avoid unresolved symbols we must use fixed addresses in our commands that refer external symbols when we use it in a common way. To ensure that fixed addresses are working correctly at runtime, this forces a special layout in our ROM image.

All we need are four fixed addresses, and the correct code at these points:

  • reset vector
  • the program code to load the GDT and switch to Linear Flat Mode
  • the Global Descriptor Table for Linear Flat Mode
  • the program code entered in Linear Flat Mode

The first two of this list are realmode sections we need special handling for, the last two are allready 32 bit section, we can link and use as expected.

Blocklayout.png

These four fixed addresses are defined in the LinuxBIOSv3 menu when the expert mode is enabled. This menu is only important for developers, as they must define the addresses and sizes of this four areas to fit platform's requirements.

The linker script to achive this layout looks like this:

OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
OUTPUT_ARCH(i386)

SECTIONS
{
        .stage0_1 CONFIG_STAGE_0_PA_BASE : AT ( 0 ) {
                _stage0_1 = .;
                *(.text);
                *(.data);
                *(.bss);
                *(.rodata.*)
                *(.rodata)
                _estage0_1 = .;
        }

/* ############## Create the delicate workflow for reset ################### */

        .stage0_flat_mode CONFIG_STAGE_0_PA_FLAT_SIDE : \
                AT ( CONFIG_STAGE_0_PA_FLAT_SIDE - CONFIG_STAGE_0_PA_BASE ) {
                FILL(0xFF);
                *(.flat_mode_first);
                *(.flat_mode);
        }

        .stage0_gdt CONFIG_STAGE_0_PA_GDT : \
                AT ( CONFIG_STAGE_0_PA_GDT - CONFIG_STAGE_0_PA_BASE ) {
                FILL(0xFF);
                *(.global_gdt);
        }

        .stage0_real_mode CONFIG_STAGE_0_PA_REAL_SIDE - 0xFFF00000 : \
                AT ( CONFIG_STAGE_0_PA_REAL_SIDE - CONFIG_STAGE_0_PA_BASE ) {
                FILL(0x90);     /* fill with NOP opcodes */
                *(.real_mode_first);
                *(.real_mode);
        }

        .stage0_reset CONFIG_STAGE_0_PA_RESET_VECTOR - 0xFFF00000 : \
                AT ( CONFIG_STAGE_0_PA_RESET_VECTOR - CONFIG_STAGE_0_PA_BASE ) {
                FILL(0xFF);
                *(.reset_first);
                *(.reset);
        }

        /* fill the image up to the end */
        .stage0_filler 0x0000FFFE : AT ( 0xFFFFFFFE - CONFIG_STAGE_0_PA_BASE ) {
                FILL(0xFF);
                BYTE(0xFE);     /* Computer type (XT) */
                BYTE(0xFF);     /* Checksum byte */
        }

        /DISCARD/ : {
                *(.comment)
                *(.note)
                *(.note.GNU-stack)
        }
}

With this lables our layout now looks like this:

Llayout.png

After linking the whole code we could check the result with:

[jbe@jupiter]~/LinuxBIOSv3 > objdump -h build/stage0.o

build/stage0.o:     file format elf32-i386

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .stage0_1     0000199d  ffffc100  00000000  00000100  2**5
                  CONTENTS, ALLOC, LOAD, CODE
  1 .stage0_flat_mode 00000178  fffffc00  00003b00  00001c00  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .stage0_gdt   00000028  ffffff00  00003e00  00001f00  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  3 .stage0_real_mode 0000002e  000fffa0  00003ea0  00001fa0  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  4 .stage0_reset 0000000e  000ffff0  00003ef0  00001ff0  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  5 .stage0_filler 00000002  0000fffe  00003efe  00001ffe  2**0
                  CONTENTS, ALLOC, LOAD, DATA

This is the layout.

Now we will fill the sections with code. The first one is the last one in the ROM: The reset vector. To avoid any unresolved symbols in this section we must create the branch opcode manually.

       .code16
       .section ".reset_first", "ax"
       .globl reset_entry

reset_entry:
       .byte 0xe9
       .word CONFIG_STAGE_0_PA_REAL_SIDE-(CONFIG_STAGE_0_PA_RESET_VECTOR+0x3)

We can check the result with objdump:

[jbe@jupiter]~/LinuxBIOSv3 > objdump -mi8086 -d -j .stage0_reset build/stage0.o

build/stage0.o:     file format elf32-i386

Disassembly of section .stage0_reset:

000ffff0 <reset_entry>:
   ffff0:       e9 ad ff                jmp    ffa0 <gdt_limit+0xff79>

The next step is to load a GDT, enabling protected mode and jump to the flat mode code. This is also 16 bit code so we cannot use any labels to load the GDT and to jump to the correct flat mode code address. But we alinged both to fixed physical addresses, so we can use these addresses now:

       .code16
       .section ".real_mode", "ax"
       .globl real_mode_fallback_entry

real_mode_fallback_entry:
       cli
       movl    %eax, %ebp;     /* save the BIST result */
       xorl    %eax, %eax
       movl    %eax, %cr3      /* Invalidate TLB */

       data32  lgdt %cs:(CONFIG_STAGE_0_PA_GDT-0xFFFF0000)

       movl    %cr0, %eax
       andl    $0x7FFAFFD1, %eax /* PG,AM,WP,NE,TS,EM,MP = 0 */
       orl     $0x60000001, %eax /* CD, NW, PE = 1 */
       movl    %eax, %cr0

       movl    %ebp, %eax      /* Restore BIST result */
       data32  ljmp $ROM_CODE_SEG, $CONFIG_STAGE_0_PA_FLAT_SIDE

We can also check the result with objdump:

[jbe@jupiter]~/LinuxBIOSv3 > objdump -mi8086 -d -j .stage0_real_mode build/stage0.o

build/stage0.o:     file format elf32-i386

000fffa0 <real_mode_fallback_entry>:
  fffa0:       fa                      cli
  fffa1:       66 89 c5                mov    %eax,%ebp
  fffa4:       66 31 c0                xor    %eax,%eax
  fffa7:       0f 22 d8                mov    %eax,%cr3
  fffaa:       66 2e 0f 01 16 00 ff    lgdtl  %cs:-256
  fffb1:       0f 20 c0                mov    %cr0,%eax
  fffb4:       66 25 d1 ff fa 7f       and    $0x7ffaffd1,%eax
  fffba:       66 0d 01 00 00 60       or     $0x60000001,%eax
  fffc0:       0f 22 c0                mov    %eax,%cr0
  fffc3:       66 89 e8                mov    %ebp,%eax
  fffc6:       66 ea 00 fc ff ff 08    ljmpl  $0x8,$0xfffffc00
  fffcd:       00

How it works at runtime

Runtime.png

  1. After reset the CPU starts to fetch opcodes virtually from address 0xF000:0xFFF0 (=0xFFFFFFF0 = CONFIG_STAGE_0_PA_RESET_VECTOR). It will fetch our branch command and continues at 0xF000:0xFFA0 (=0xFFFFFFA0 = CONFIG_STAGE_0_PA_REAL_SIDE)
  2. The real mode code loads the GDT at the real mode offset 0xFF00 relative to the CS register. This results into the address 0xF000:0xFF00 (=0xFFFFFF00 = CONFIG_STAGE_0_PA_GDT)
  3. The switch to the protected mode jumps to 32 bit offset 0xFFFFFC00, but now CS register's baseaddress is 0 (from GDT entry at offset 8), so we match CONFIG_STAGE_0_PA_FLAT_SIDE
  4. the flat mode code does
    1. activate the full Linear Flat Mode by loading the remaining segment registers
    2. activate CAR (CPU dependend)
  5. At the end, its time to jump into the stage0_1 code at label stage1_main(). From now on, there is no more need for any special physical layout