--Jbe 16:47, 30 June 2007 (CEST)
How LinuxBIOS starts after Reset
Whenever an x86 CPU wakes up after reset, it does it in Real Mode. This mode is limited to 1MiB address space and 64k offsets and the reset vector of the original 8086/88 was located at 0xFFFF0.
As there was no change even if we run current processors like P3, these newer CPUs also feels like they where start at 0xF0000:0xFFF0 after a reset. But they do not. The base of the code segment register is 0xFFFF0000 after reset, so the CPU generates a physical address of 0xFFFFFFF0 to the chipset. And the chipset is responsible to forward this area to the boot ROM. Its confusing: The CPU "thinks" it runs code at 0xF000:0xFFF0 but instead it uses code at 0xFFFFFFF0. The developers must be tanked up when they realised this design into silicon.
On some chipsets there is an additional pitfall: The so called A20 gate. It was introduced to support full compatibility for the 80286 CPU with their predecessor 8086. When the old CPU accesses space "behind" the 0xFFFFF (=1MiB) limit, they wrap around to 0x00000. On 80286 and newer processors accesses above 0xFFFFF natively do not wrap around. The external A20 gate forces this wrap also on newer CPUs. On older CPUs its open after reset, so the CPU cannot generate addresses with A20 set.
Next pitfall on some chipsets is, if they are able to forward two address spaces to the boot ROM: 0xFFFFF0000 and 0xF0000. If not and the opcode at the reset vector does something like "jmp 0xF0000:xxxx" this crashes the machine immediately, as this will force the baseaddress of the code segment register to 0. After this, the CPU really outputs address with 0xFxxxx to the chipset. And if the chipset cannot handle the forwarding of two address spaces, the boot ROM cannot be accessed anymore. You are lost.
How to escape from these restrictions?
First of all, we must ensure not to touch the baseaddress of the code segment register. This will keep us in the 0xFFFF0000 address space. We can ensure this by using branches only instead of jumps. So the opcode at the reset vector must be nothing else than a branch command! The next step depends on the used chipset. Does it open the gate A20 after reset? If yes, we must close it prior switching to Linear Flat Mode.
- loading a Global Descriptor Table
- activating the pm bit in the CCR0 register
- reloading of the code segment register (with a far jump)
This only requires a small amount of code. So we can shrink the pain of the real mode to only a very small part of our whole program.
Everything becomes easy when we are reaching the Linear Flat Mode. No more hardware pain, no more toolchain pain. Why Linear Flat Mode and not Protected Mode? Most people call this operation mode Protected Mode, when they switch the CPU to its native 32bit operation mode in this way. But it's only a Linear Flat Mode, as there is no protection at all. Its more like a 32 bit real mode. There is no address translation, no paging and no protection
The Linear Flat Mode
When all segment registers uses the same baseaddress and limits, it is called the Linear Flat Mode. Advantages:
- no limits in the 4GiB address space
- everything is allowed
- no access restrictions to RAM and I/O
- clear and easy to understand source code, no "tricks" required to access space above 1MiB
- no protection if someone is working with a NULL or invalid pointer
- no protection if any accessed address is invalid
- as everything is allowed, everything could work against you
- no protection of stack overflow
There is still one pain after entering the Linear Falt Mode: The lack of system RAM. This issue will be addressed by the Cache As RAM solution. See below.
The Toolchain Pain and how to solve it
It seems modern toolchains handle the Real Mode opcode generation in a correct manner. But they do not really support Real Mode sections across all parts of the toolchain. The flaw is the linker: You can't link Real Mode sections (means 16 bit), if they contain unresolved symbols (the direction doesn't matter).
But you can link every Real Mode section if it does not contain any unresolved symbols! So to solve the toolchain pain we only must avoid unresolved symbols in our Real Mode sections! As we control their content this is a way to go.
To avoid unresolved symbols we must use fixed addresses in our commands that refer external symbols when we use it in a common way. To ensure that fixed addresses are working correctly at runtime, this forces a special layout in our ROM image.
All we need are four fixed addresses, and the correct code at these points:
- reset vector
- the program code to load the GDT and switch to Linear Flat Mode
- the Global Descriptor Table for Linear Flat Mode
- the program code entered in Linear Flat Mode
The first two of this list are realmode sections we need special handling for, the last two are allready 32 bit section, we can link and use as expected.