CBFS: Difference between revisions

From coreboot
Jump to navigation Jump to search
(Discussion of coreboot/libpayload future Redesign work, as well as the associated Remaining Design Work)
(Explain how to enable multiple CBFSes when the time comes)
Line 500: Line 500:
* We'll need Make to actually build the new style of image. It'll need to pass in the generated FMAP, a list of the sections that will contain CBFSes (which can also be obtained when the fmd file is compiled), and explicitly add the bootblock to the appropriate individual region, choosing whether to bottom- or top-align it based on the architecture.
* We'll need Make to actually build the new style of image. It'll need to pass in the generated FMAP, a list of the sections that will contain CBFSes (which can also be obtained when the fmd file is compiled), and explicitly add the bootblock to the appropriate individual region, choosing whether to bottom- or top-align it based on the architecture.
* It would be nice to have a pluggable post-packaging step where the user can optionally use Kconfig to specify a script that should be run on the image to make any necessary alterations or customizations as soon as the normal build process has completed. As an example, such scripts' responsibilities might include adding binary data to raw image regions or copying stages and other files into secondary CBFSes within the same image.
* It would be nice to have a pluggable post-packaging step where the user can optionally use Kconfig to specify a script that should be run on the image to make any necessary alterations or customizations as soon as the normal build process has completed. As an example, such scripts' responsibilities might include adding binary data to raw image regions or copying stages and other files into secondary CBFSes within the same image.
* When all the plumbing is in place (including a relevant postprocessing script), all that will need to be done to produce an image with multiple CBFSes is to modify the fmd file to annotate more section(s) as "(CBFS)."


=== coreboot/libpayload ===
=== coreboot/libpayload ===

Revision as of 04:32, 8 May 2015

The coreboot CBFS Specification was originally announced here.

See cbfs.txt for details.

Introduction

This document describes the coreboot CBFS specification (from here referred to as CBFS). CBFS is a scheme for managing independent chunks of data in a system ROM. Though not a true filesystem, the style and concepts are similar.


Architecture

The CBFS architecture looks like the following:

/---------------\ <-- Start of ROM
| /-----------\ | --|
| | Header    | |   |
| |-----------| |   |
| | Name      | |   |-- Component
| |-----------| |   |
| |Data       | |   |
| |..         | |   |
| \-----------/ | --|
|               |
| /-----------\ |
| | Header    | |
| |-----------| |
| | Name      | |
| |-----------| |
| |Data       | |
| |..         | |
| \-----------/ |
|               |
| ...           |
| /-----------\ |
| |           | |
| | Bootblock | |
| | --------- | |
| | Reset     | | <- 0xFFFFFFF0
| \-----------/ |
\---------------/


The CBFS architecture consists of a binary associated with a physical ROM disk referred hereafter as the ROM. A number of independent of components, each with a header prepended on to data are located within the ROM. The components are nominally arranged sequentially, though they are aligned along a pre-defined boundary.

The bootblock occupies the last 20k of the ROM. Within the bootblock is a master header containing information about the ROM including the size, alignment of the components, and the offset of the start of the first CBFS component within the ROM.

(Note that the master header is currently being removed as part of a significant redesign and modernization of the CBFS structures. See the #Redesign section for more details on the changes being made and where we are in the process.)

Master Header

The master header contains essential information about the ROM that is used by both the CBFS implementation within coreboot at runtime as well as host based utilities to create and manage the ROM. The master header will be located somewhere within the bootblock (last 20k of the ROM). A pointer to the location of the header will be located at offset -4 from the end of the ROM. This translates to address 0xFFFFFFFC on a normal x86 system. The pointer will be to physical memory somewhere between - 0xFFFFB000 and 0xFFFFFFF0. This makes it easier for coreboot to locate the header at run time. Build time utilities will need to read the pointer and do the appropriate math to locate the header.

The following is the structure of the master header:

struct cbfs_header {
        uint32_t magic;
        uint32_t version;
        uint32_t romsize;
        uint32_t bootblocksize;
        uint32_t align;
        uint32_t offset;
        uint32_t architecture;
        uint32_t pad[1];
} __attribute__((packed));
Offset 0 1 2 3 4 5 6 7 8 9 A B C D E F
0x00 magic = "ORBC" version romsize bootblocksize
0x10 align offset architecture padding

The meaning of each member is as follows:

  • magic is a 32 bit number that identifies the ROM as a CBFS type. The magic number is 0x4F524243, which is 'ORBC' in ASCII.
  • version is a version number for CBFS header. cbfs_header structure may be different if version is not matched.
  • romsize is the size of the ROM in bytes. Coreboot will subtract 'size' from 0xFFFFFFFF to locate the beginning of the ROM in memory.
  • bootblocksize is the size of bootblock reserved in firmware image.
  • align is the number of bytes that each component is aligned to within the ROM. This is used to make sure that each component is aligned correctly with regards to the erase block sizes on the ROM - allowing one to replace a component at runtime without disturbing the others.
  • offset is the offset of the the first CBFS component (from the start of the ROM). This is to allow for arbitrary space to be left at the beginning of the ROM for things like embedded controller firmware.
  • architecture describes which architecture (x86, arm, ...) this CBFS is created for.

(Note that the master header is currently being removed as part of a significant redesign and modernization of the CBFS structures. See the #Redesign section for more details on the changes being made and where we are in the process.)

Bootblock

The bootblock is a mandatory component in the ROM. It is located in the last 20k of the ROM space, and contains, among other things, the location of the master header and the entry point for the loader firmware. The bootblock does not have a component header attached to it.

(Note that the master header location is currently being removed as part of a significant redesign and modernization of the CBFS structures. See the #Redesign section for more details on the changes being made and where we are in the process.)

Components

CBFS components are placed in the ROM starting at 'offset' specified in the master header and ending at the bootblock. Thus the total size available for components in the ROM is (ROM size - 20k - 'offset'). Each CBFS component is to be aligned according to the 'align' value in the header. Thus, if a component of size 1052 is located at offset 0 with an 'align' value of 1024, the next component will be located at offset 2048.

Each CBFS component will be indexed with a unique ASCII string name of unlimited size.

Each CBFS component starts with a header:

struct cbfs_file {
         char magic[8];
         uint32_t len;
         uint32_t type;
         uint32_t checksum;
         uint32_t offset;
};
Offset 0 1 2 3 4 5 6 7 8 9 A B C D E F
0x00 magic = "LARCHIVE" len type
0x10 checksum offset
  • magic is a magic value used to identify the header. During runtime, coreboot will scan the ROM looking for this value. The default magic is the string 'LARCHIVE'.
  • len is the length of the data, not including the size of the header and the size of the name.
  • type is a 32 bit number indicating the type of data that is attached. The data type is used in a number of ways, as detailed in the section below.
  • checksum is a 32bit checksum of the entire component, including the header and name.
  • offset is the start of the component data, based off the start of the header. The difference between the size of the header and offset is the size of the component name.

Immediately following the header will be the name of the component, which will null terminated and 16 byte aligned. The following picture shows the structure of the header:

/--------\  <- start
| Header |
|--------|  <- sizeof(struct cbfs_file)
| Name   |
|--------|  <- 'offset'
| Data   |
| ...    |
\--------/  <- start + 'offset' + 'len'

(Note that there are some changes to the header format in the pipeline as part of a significant redesign and modernization of the CBFS structures. See the #Redesign section for more details on the updates and where we are in the process.)

Searching Alogrithm

To locate a specific component in the ROM, one starts at the 'offset' specified in the CBFS master header. For this example, the offset will be 0.

From that offset, the code should search for the magic string on the component, jumping 'align' bytes each time. So, assuming that 'align' is 16, the code will search for the string 'LARCHIVE' at offset 0, 16, 32, etc. If the offset ever exceeds the allowable range for CBFS components, then no component was found.

Upon recognizing a component, the software then has to search for the specific name of the component. This is accomplished by comparing the desired name with the string on the component located at offset + sizeof(struct cbfs_file). If the string matches, then the component has been located, otherwise the software should add 'offset' + 'len' to the offset and resume the search for the magic value.

(Note that the first step is currently being changed such that the location of the first component is read from the image's global FMAP. See the #Redesign section for more details on the updates and where we are in the process.)

Data Types

The 'type' member of struct cbfs_file is used to identify the content of the component data, and is used by coreboot and other run-time entities to make decisions about how to handle the data.

There are three component types that are essential to coreboot, and so are defined here.

Stages

Stages are code loaded by coreboot during the boot process. They are essential to a successful boot. Stages are comprised of a single blob of binary data that is to be loaded into a particular location in memory and executed. The uncompressed header contains information about how large the data is, and where it should be placed, and what additional memory needs to be cleared.

Stages are assigned a component value of 0x10. When coreboot sees this component type, it knows that it should pass the data to a sub-function that will process the stage.

The following is the format of a stage component:

/--------\
| Header |
|--------|
| Binary |
| ..     |
\--------/

The header is defined as:

struct cbfs_stage {
         uint32_t compression;
         uint64_t entry;
         uint64_t load;
         uint32_t len;
         uint32_t memlen;
};
Offset 0 1 2 3 4 5 6 7 8 9 A B C D E F
0x00 compression entry load ...
0x10 ... load len memlen
  • compression is an integer defining how the data is compressed. There are three compression types defined by this version of the standard: none (0x0), lzma (0x1), and nrv2b (0x02, deprecated), though additional types may be added assuming that coreboot understands how to handle the scheme.
  • entry is a 64 bit value indicating the location where the program counter should jump following the loading of the stage. This should be an absolute physical memory address.
  • load is a 64 bit value indicating where the subsequent data should be loaded. This should be an absolute physical memory address.
  • len is the length of the compressed data in the component.
  • memlen is the amount of memory that will be used by the component when it is loaded.

The component data will start immediately following the header.

When coreboot loads a stage, it will first zero the memory from 'load' to 'memlen'. It will then decompress the component data according to the specified scheme and place it in memory starting at 'load'. Following that, it will jump execution to the address specified by 'entry'. Some components are designed to execute directly from the ROM - coreboot knows which components must do that and will act accordingly.

Payloads

Payloads are loaded by coreboot following the boot process.

For more details, also see SELF.

Stages are assigned a component value of 0x20. When coreboot sees this component type, it knows that it should pass the data to a sub-function that will process the payload. Furthermore, other run time applications such as 'bayou' may easily index all available payloads on the system by searching for the payload type.


The following is the format of a stage component:

/-----------\
| Header    |
| Segment 1 |
| Segment 2 |
| ...       |
|-----------|
| Binary    |
| ..        |
\-----------/

The header is as follows:

struct cbfs_payload {
         struct cbfs_payload_segment segments;
}

The header contains a number of segments corresponding to the segments that need to be loaded for the payload.

The following is the structure of each segment header:

struct cbfs_payload_segment {
         uint32_t type;
         uint32_t compression;
         uint32_t offset;
         uint64_t load_addr;
         uint32_t len;
         uint32_t mem_len;
};
Offset 0 1 2 3 4 5 6 7 8 9 A B C D E F
0x00 type compression offset load_addr ...
0x10 ... load_addr len mem_len
  • type is the type of segment, one of the following:
PAYLOAD_SEGMENT_CODE   0x45444F43   The segment contains executable code
PAYLOAD_SEGMENT_DATA   0x41544144   The segment contains data
PAYLOAD_SEGMENT_BSS    0x20535342   The memory speicfied by the segment
                                     should be zeroed
PAYLOAD_SEGMENT_PARAMS 0x41524150   The segment contains information for
                                     the payload
PAYLOAD_SEGMENT_ENTRY  0x52544E45   The segment contains the entry point
                                     for the payload
  • compression is the compression scheme for the segment. Each segment can be independently compressed. There are three compression types defined by this version of the standard: none (0x0), lzma (0x1), and nrv2b (0x02, deprecated), though additional types may be added assuming that coreboot understands how to handle the scheme.
  • offset is the address of the data within the component, starting from the component header.
  • load_addr is a 64 bit value indicating where the segment should be placed in memory.
  • len is a 32 bit value indicating the size of the segment within the component.
  • mem_len is the size of the data when it is placed into memory.

The data will located immediately following the last segment.

Option ROMS

The third specified component type will be Option ROMs. Option ROMS will have component type '0x30'. They will have no additional header, the uncompressed binary data will be located in the data portion of the component.

NULL

There is a 4th component type ,defined as NULL (0xFFFFFFFF). This is the "don't care" component type. This can be used when the component type is not necessary (such as when the name of the component is unique. i.e. option_table). It is recommended that all components be assigned a unique type, but NULL can be used when the type does not matter.

Redesign

The CBFS system is currently being modernized with the vision of greater adaptability to a variety of use cases. This work is being pursued as part of the Flashmap integration work; as such, it's being committed to the Chromium OS coreboot fork first, then upstreamed to the main repository. There are several important aspects to the design changes:

FMAP

One of the big goals here is to seamlessly support multiple CBFSes per firmware image. This is something that would be very useful to users such as the Chromium OS project that have complex firmware stacks with numerous components and modular update schemes, and having it in upstream coreboot would make the project more scalable to large projects, and consequently more adoptable.

The flashmap format provides the flash chip analog of disks' partition tables, and is therefore well suited to describing where the individual CBFSes live in flash. The idea is that the FMAP will eventually be a mandatory component of a coreboot firmware image, and will be consulted to find the location of the CBFS(es) before reading their structure. The coreboot binaries themselves will have compiled-in knowledge of where the FMAP begins, eliminating the need for expensive runtime flash searches while ideally reducing the number of compiled-in offsets to the one.

Master Header

Most of the information contained in the CBFS master header becomes redundant once a flashmap is present:

  • romsize: All uses can be replaced with reads of the chip size as stored in FMAP.
  • bootblocksize: The only reason this is needed is that the bootblock is currently jammed into space carved from that otherwise available to the CBFS. However, once we have an FMAP to describe the image's layout, we can place the bootblock in its own fixed-size FMAP section. On x86, it should always fit in 8K, and on ARM, we can let it be 128K; this space can be carved out of that currently allocated to the (primary) CBFS.
  • align: In practice, this is always 64, so we can just default to that. There's very little reason to change it and a risk of breakage, so users who really want it to be different can manually update the #define s in cbfstool and coreboot proper.
  • offset: This "points" to the first file header within the CBFS. This has been necessary on non-x86 because the bootblock occupies the lowest addresses of the CBFS section; however, with the bootblock moved to its own fixed-size FMAP section (see the bootblocksize point), the first entry can always be located at the beginning of each CBFS region, allowing one to find it simply by reading the offset from its FMAP section.
  • architecture: This is only (decreasingly) used by cbfstool, and the special cases that required it are no longer important to cbfstool once the bootblock is located in its own image region.

For this reason, the master header is being removed as FMAPs are being added. Relevant CL: http://review.coreboot.org/#/c/10135 (upstreamed from https://chromium-review.googlesource.com/#/c/265863)

Version Coding and Hashing: File Entry Headers

Because the version information was previously stored in the master header, it needs a new place to live before the new type of image comes into use. As such, the following modifications to the per--file entry header format are being implemented:

  • The checksum field has always been present with a fixed width of 32 bytes, but it has never been used or set to anything besides 0. It's being converted into two unsigned 16-bit fields, version and flags. The latter is not yet used for anything, but is intended to be used as a bitfield of special properties to be defined later.
  • A new unsigned 32-bit header_len field is being added for the purpose of making it easy for future old versions of cbfstool to skip---or even copy---headers that they are too old to completely understand.
  • The concept of a hash is being reintroduced as a variable-width field with extensible support for multiple hash algorithms. Its length is dictated by a hash_type field that can be 0 for no hash or any other value representing a hash recognizable to the CBFS driver. It---along with all future variable-width header fields---are to be located directly after the fixed-width fields and immediately followed by the NUL-terminated filename; within this area, their relative positions are dictated by a combination of the header_len field and their individual offset fields (e.g. hash_offset for the hash).

These changes are being made as part of the Chromium OS project, and haven't yet been upstreamed. Relevant (Chromium OS) CL: https://chromium-review.googlesource.com/#/c/268408

Progress and Future Work

In short: cbfstool support for the new format is landing/on the way, but the build system, coreboot, and libpayload changes still need to be made.

cbfstool

Support for the redesigned image format discussed herein has been implemented in cbfstool with the following user-visible interface changes:

  • Most actions now accept the -r option for specifying a comma-separated list of which regions to operate on when working with new-format images. If this flag is't provided, they default to the primary CBFS, which will hereafter always be located in a section called COREBOOT.
  • The create action now has a second form that accepts a compiled FMAP and creates a new-format image instead of what we're now calling the legacy type. During creation is your only chance to initialize designated image regions as CBFSes; by default, only the COREBOOT section, but the -r switch can be used to add others as well.
  • There's a new layout action for listing the mutable regions of a new-format image. It also accepts a switch to display the read-only regions as well.
  • There are new read and write actions for working with sections containing raw data that is not part of a CBFS. Overwriting CBFS-containing regions with raw data is not permitted.
  • With the exception of top-aligned addresses, any positions specified or requested are now relative to the beginning of their image region (rather than the beginning of the entire flash image).

While support for reading from the new image format has not yet been added to coreboot itself or libpayload, this will be a much easier problem for two reasons: these components shouldn't need write support and there's no reason for them to still support reading legacy images once we make the switch. It should be noted that the changes made to cbfstool preserve its backwards compatibility with legacy images: because invoking it in the same way as before continues to create and manipulate legacy images, it is possible for the build system coreboot to continue using the legacy format exclusively until all the necessary pieces are in place to switch over.

Build System

The following build system changes are needed before we can complete a switch to the new image format:

  • In order to create the new images, one needs to have a flashmap file. We've designed a textual language called fmd ("flashmap descriptor") for describing these files and a compiler to produce them (see Flashmap), but we still need to add template fmd files for all common architecture/flash chip size pairings. These files should be checked in under src/arch/*/, and should be named such that the build system can use $(CONFIG_COREBOOT_ROMSIZE_KB) to decide which one to use. For x86, we'll probably need two sets of fmd files, one for Intel chips that need to be built with the IFD/ME blob combination.
  • The build system then needs to actually *call* the compiler to produce the FMAP binary from the fmd file. A side effect of this process is a header file containing a #define to the FMAP's offset from the beginning of the firmware image, and since this information is needed during compile time, the FMAP will have to be compiled before the code. The user should be able to override which fmd file is used via Kconfig to allow custom configurations.
  • The CONFIG_FLASHMAP_OFFSET options needs to be replaced with the #define in the header generated when the fmd file is compiled down to an FMAP. Other unnecessary Kconfig keys should also be removed while we're at it: for instance, CONFIG_CBFS_SIZE comes to mind as one that should be read from the FMAP at runtime rather than set redundantly in Kconfig.
  • We'll need Make to actually build the new style of image. It'll need to pass in the generated FMAP, a list of the sections that will contain CBFSes (which can also be obtained when the fmd file is compiled), and explicitly add the bootblock to the appropriate individual region, choosing whether to bottom- or top-align it based on the architecture.
  • It would be nice to have a pluggable post-packaging step where the user can optionally use Kconfig to specify a script that should be run on the image to make any necessary alterations or customizations as soon as the normal build process has completed. As an example, such scripts' responsibilities might include adding binary data to raw image regions or copying stages and other files into secondary CBFSes within the same image.
  • When all the plumbing is in place (including a relevant postprocessing script), all that will need to be done to produce an image with multiple CBFSes is to modify the fmd file to annotate more section(s) as "(CBFS)."

coreboot/libpayload

The main coreboot code and libpayload need to be updated to be able to read from the new images at all:

  • In order to find the CBFS, they need to know to search the FMAP for a COREBOOT section instead of expecting a CBFS master header pointer. This should be relatively easy to implement in adurbin's new region-based FMAP/CBFS API, but getting it into libpayload (ideally without code duplication) will take additional thought. Also be warned that on some platforms (or just x86?), the first CBFS scan happens in assembly; the code for this is at src/arch/x86/lib/walkcbfs.S, and nothing apparent will happen until it's at least limping along.
  • They need to be updated to cope with the length-determination mechanism of the revised headers once those have been upstreamed.
  • The above should be (close to?) sufficient to get things *running*, but to get multiple CBFSes to work, there'll need to be additional changes to the CBFS API. When designing that new interface, it'll be important to consider use cases that involve copying stages between CBFSes: for instance, Chromium OS copies the ramstage (and romstage, on arm) from the main, permanently read-only CBFS to two updateable sections. If we want to continue allowing identical relocatable code to run from multiple parts of the image, the API will need to be stateful (i.e. remember which CBFS it's currently reading from). If this isn't something we want to continue "supporting," it might be okay to require the caller to specify the desired CBFS at every read call.

Remaining Design Work

The major remaining shortcoming is support for taking a structural hash of an entire CBFS. If such a hash were hierarchical and hashed the headers (including hashes and filenames) of all contained files, it would provide a guarantee that that entire portion of the image had been read correctly and hadn't been tampered with. Here are some design considerations, all of which are still open questions:

  • How should the hashes be concatenated? Hash extension is the easiest and most space efficient; however, we should keep in mind that a faulty (or malicious?) memory controller might not feed us the same data each time we request it, so if we proceed in that way, we'll need to rehash at least all preceding headers each time we read one back in from storage.
  • Where should the resulting hierarchical hash be stored? This has proven to be somewhat of a contentious issue, with some people believing it should be placed in a new---but completely different---"master header" for the CBFS, and others holding it should just go in a file with a specific name or its own raw section. A related question is which component should be responsible for checking the structural hash: if it's stored in a special header, the CBFS driver would be able to automatically check the whole image just as it will verify individual files based on their headers' stored hashes. Otherwise, vboot or user-specific custom code might have to do the verification, which could result in less code reuse, worse adoptability, and a messier and more poorly encapsulated CBFS read API (that, for instance, might require passing in a separately cached comparison hash every time a read is requested).
  • How do we skip some files, if that's something we want to support? Maybe null-type and deleted headers shouldn't be included in the hash? Maybe there should also be a flag that can be set to exclude certain files? (The latter would be necessary if we were to store the hierarchical hash in a file within the checksummed CBFS itself.)
  • How do we guard against malicious headers (e.g. those with really long filenames intended to make the hashing hang, overflow, or otherwise barf?