[coreboot] ACPI S3 with coreboot-v2 on DBM690T

Feng, Libo Libo.Feng at amd.com
Thu Dec 11 10:50:26 CET 2008

Hi, All,

After a dozens of day's struggling, I implement the ACPI S3 with coreboot-v2 on DBM690T. I write a note as below, please help to review it. Even though it is so primary and platform-specific, I appreciate any comments. Thank you in advance.

Platform: DBM690T populated 1G RAM, internal GFX
          Fedora 6 shipping with 6.2.18 kernel

The basic principle of ACPI S3 with coreboot-v2 on DBM690T:
Before Payload running, two major memory regions are used by CAR and coreboot:
region 1: start address = 0x4000, length = 0x2ca0c(I think this only includes code and const), coreboot will run here after loading.
region 2: start address = 0x1f8000, length = 0x8000, CAR uses this region as stack at the final stage.
Furthermore, some tables occupy other small memory regions, such as IRQ table, MP table, ACPI table and coreboot table. When resuming, all these memory regions need to be protected from coreboot's overwriting. In the function of post_cache_as_ram, just before switching the stack from cache to RAM, the lowest 2M data are copied into the topmost 2M RAM. In the case of DBM690T, 1G RAM is populated while the last 128M RAM is reserved for the internal GFX as video memory. Linux of FC6 runs under the text mode on my DBM690T, so most video memory is not used. The 2M data can be kept intact in the topmost 2M RAM and no OS memory will be ruined by this moving. After all devices initialized, coreboot will check whether it is S3 resuming or not. If S3 resuming, the 2M data will be copied back into the lowest 2M RAM from the topmost 2M RAM and coreboot shifts the control to OS by jumping into the waking vector contained in the FACS table.

The source code is a big mess and full of tricks, furthermore, it is so platform-specific by now: DBM690T populated 1G RAM, the internal GFX applied and FC 6 shipping with 6.2.18 kernel, it is so experimental that I can not check it in. The source code and comments are listed below. Hope it helpful for others.

STEP 1. check the sleep resuming state and move the 2M data in the function of
post_cache_as_ram from the lowest 2M RAM to the topmost 2M RAM:

    /* BTDC: implement ACPI S3. */
    sleep_type = inw(0x804); /* BTDC: hard-coded, modify it later. */
    sleep_type = (sleep_type>>10) & 0x07; /* BTDC: get the sleep type. */
    print_debug_pcar("BTDC: sleep_type = \n", sleep_type);

    if(3 == sleep_type)
    { /* BTDC: resuming from S3. */
        msr_t msr_tmp;
        u32 * pt;
        u32 temp;

        print_debug("BTDC: save the memory for OS resuming.\n");

        /* BTDC: at this point, only the fixed MTRR 0x269 for stack in cache
         * and the variable MTRR 0x202 for source code in FLASH are set, so if
         * I want move the 2M data, I have to set some MTRRs myself.
        __asm__ volatile (
            /* BTDC: disable the fixed MTRRs temporarily. */
            "movl $0xC0010010, %ecx\n\t"
            "andl $(~(3<<18)), %eax\n\t"

            /* BTDC: enable the default MTRR, so I can access the whole RAM. */
            "movl $0x2ff, %ecx\n\t"
            "xorl %edx, %edx\n\t"
            "movl $0x00000800, %eax\n\t"

        /* BTDC: save the memory for OS resuming. */
        pt = 0;
        temp = *pt;
        print_debug_pcar("\nBTDC: the memory of 0 = ", temp);

        pt = 2*1024*1024 - 4;
        temp = *pt;
        print_debug_pcar("\nBTDC: the memory of 2M = ", temp);

        memcopy(0x3fe00000, 0, 2*1024*1024); /* BTDC: move the 2M data. */

        pt = 0x3fe00000;
        temp = *pt;
        print_debug_pcar("\nBTDC: the memory of mirror 0 = ", temp);

        pt = 2*1024*1024 - 4 + 0x3fe00000;
        temp = *pt;
        print_debug_pcar("\nBTDC: the memory of mirror 2M = ", temp);

        /* BTDC: restore the MTRR previously modified. */
        __asm__ volatile (
            /* BTDC: enable the fixed MTRRs again. */
            "movl $0xC0010010, %ecx\n\t"
            "orl $(3<<18), %eax\n\t"

            "movl $0x2ff, %ecx\n\t"
            "xorl %edx, %edx\n\t"
            "movl $0x00000c00, %eax\n\t"
        print_debug("BTDC: saving finished\n");

    /* BTDC: just before the 2M data is cleared. */
    set_init_ram_access(); /* So we can access RAM from [1M, CONFIG_LB_MEM_TOPK) */

STEP 2. if the system is resuming from S3, move the 2M data from the topmost 2M memory back into the lowest 2M memory in the function of hardwaremain and jump into the waking vector in the FASC table instead of loading payload.

first, I set up a GDT table and two pseudo-descriptions for GDT and IDT respectively  at 0x3fdfffe8, 0x3fdfffe0 and 0x3fdfffd8 for the 32bit-16bit opcode switching.
	memcpy((void *)(0x3fe00000-sizeof(real_mode_gdt_entries)), real_mode_gdt_entries, sizeof(real_mode_gdt_entries));
	memcpy((void *)0x3fdfffe0, real_mode_gdt, sizeof(real_mode_gdt));
	memcpy((void *)0x3fdfffd8, real_mode_idt, sizeof(real_mode_idt));

the three tables are as below, thanks to Rudolf,
	unsigned long long real_mode_gdt_entries [3] =
		0x0000000000000000ULL,	/* Null descriptor */
		0x008f9b000000ffffULL,	/* 16-bit real-mode 64k code at 0x00000000 */
		0x008f93000000ffffULL	/* 16-bit real-mode 64k data at 0x00000100 */
	struct Xgt_desc_struct {
		unsigned short size;
		unsigned long address __attribute__((packed));
		unsigned short pad;
	} __attribute__ ((packed));

	struct Xgt_desc_struct real_mode_gdt = { sizeof (real_mode_gdt_entries) - 1, (long)real_mode_gdt_entries };
	struct Xgt_desc_struct real_mode_idt = { 0x3ff, 0 };

then, a small machine code are copied into the memory of 0x3fdfff00:
	unsigned char bincode[] = {
		/* BTDC: move 2M data back into the lowest 2M RAM.*/
		0xbe, 0x00, 0x00, 0xe0, 0x3f, /* mov esi, 3fe00000h */
		0xbf, 0x00, 0x00, 0x00, 0x00, /* mov edi, 00000000h */
		0xb9, 0x00, 0x00, 0x08, 0x00, /* mov ecx, 00080000h */
		0xfc,                         /* cld */
		0xf3, 0xa5,                   /* rep movsd es:[edi], ds:[esi] */

		/* BTDC: load new GDT and IDT for 32bit-16bit opcode switching. */
		0xfa,                         /* cli */
		0x0f, 0x01, 0x1d, 0xd8, 0xff, 0xdf, 0x3f, /* lidt [3fdfffd8h] */
		0x0f, 0x01, 0x15, 0xe0, 0xff, 0xdf, 0x3f, /* lgdt [3fdfffe0h] */
		0xb8, 0x10, 0x00, 0x00, 0x00, /* mov eax, 00000010h */
		0x8e, 0xd8,                   /* mov ds, ax */
		0x8e, 0xc0,                   /* mov es, ax */
		0x8e, 0xe0,                   /* mov fs, ax */
		0x8e, 0xe8,                   /* mov gs, ax */
		0x8e, 0xd0,                   /* mov ss, ax */
		0xea, 0x37, 0xff, 0xdf, 0x3f, 0x08, 0x00, /* jmp $0x0008:$0x3fdfff37 */
		0x90,                         /* nop */
		0x90,                         /* nop */

		/* BTDC: enter the real mode. */
		0x0f, 0x20, 0xc0,             /* movl %cr0, %eax */
		0x24, 0xfe,                   /* andb $0xfe, %al */
		0x0f, 0x22, 0xc0,             /* movl %eax, %cr0 */
		/* BTDC: jump into the waking vector in FASC. */
		/* BTDC: hard-coded and platform-specific: FC6 with 6.2.18. modify it later. */
		0xea, 0x00, 0x00, 0x00, 0x02  /* jmp 0x200:0x00 */
In my case, the FACS table is located at 0xf0620, and the waking vector is
0x2000, I hard-code just to make life easier. then, OS takes the control from
coreboot. Linux resumes from S3.

Now, two questions arise to me:

1. Is it really necessary for CAR to move the stack from 0xc8000 in cache into 0x1f8000 in RAM at the final stage of CAR? Now that the stack works well in cache, why does CAR move the stack into RAM? For verifying RAM or other stuff?

2. When resuming from S3, I initialize RAM again instead of exiting self-Refresh. Lucky enough, RAM content is also kept intact in this way. I will try the exiting self-refresh later.

My first attempt is to jump into the waking vector in the function of post_cache_as_ram, at this moment, RAM is accessible, I can get the waking vector. However, many devices are not initialized, the system is very unstable, I got different trace every time, the best was as below. So, after stuck a couple of days, I gave up and followed Rodulf's way as above.
 usbdev5.1_ep81: PM: suspend 0->2, parent 5-0:1.0 already 1
 usbdev4.1_ep81: PM: suspend 0->2, parent 4-0:1.0 already 1
 usbdev3.1_ep81: PM: suspend 0->2, parent 3-0:1.0 already 1
 usbdev2.1_ep81: PM: suspend 0->2, parent 2-0:1.0 already 1
BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1

Call Trace:
 [<ffffffff8026929b>] show_trace+0x34/0x47
 [<ffffffff802692c0>] dump_stack+0x12/0x17
 [<ffffffff8029dc68>] down_read+0x15/0x23
 [<ffffffff80296254>] blocking_notifier_call_chain+0x13/0x36
 [<ffffffff803fde58>] cpufreq_resume+0x129/0x14c
DWARF2 unwinder stuck at cpufreq_resume+0x129/0x14c
Leftover inexact backtrace:
 [<ffffffff803a4350>] __sysdev_resume+0x2a/0x66
 [<ffffffff803a44fe>] sysdev_resume+0x1d/0x63
 [<ffffffff803a8efc>] device_power_up+0x9/0xf
 [<ffffffff802a5f42>] suspend_enter+0x3e/0x47
 [<ffffffff802a608e>] enter_state+0x143/0x19b
 [<ffffffff802a6155>] state_store+0x5e/0x79
 [<ffffffff802fb1ac>] sysfs_write_file+0xca/0xf9
 [<ffffffff802162b6>] vfs_write+0xce/0x174
 [<ffffffff80216b26>] sys_write+0x45/0x6e
 [<ffffffff8025c00e>] system_call+0x7e/0x83

PCI: Enabling device 0000:00:13.0 (0000 -> 0002)
PCI: Enabling device 0000:00:13.1 (0000 -> 0002)
PCI: Enabling device 0000:00:13.2 (0000 -> 0002)
PCI: Enabling device 0000:00:13.3 (0000 -> 0002)
PCI: Enabling device 0000:00:13.4 (0000 -> 0002)
PCI: Enabling device 0000:00:13.5 (0000 -> 0002)
PCI: Enabling device 0000:00:14.2 (0000 -> 0002)
hda_intel: azx_get_response timeout, switching to single_cmd mode...
Restarting tasks... done
Enabling non-boot CPUs ...

By the way, the sequence of Linux(6.2.18) resuming is as below, if you are curious:
The waking vector points to the function of wakeup_code in the file of arch/x86_64/kernel/acpi/wakeup.s. 
Then, the function of restore_processor_state in the file of arch/x86_64/kernel/suspend.c is called.
Then, the function of __restore_processor_state is called in the same file.
Finally, I was lost here, I couldn't match the assembly language with the C code any more.

Best Regards

丰立波 Feng Libo @ AMD  Ext: 20906
Mobile Phone: 13683249071
Office Phone: 0086-010-62801406

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.coreboot.org/pipermail/coreboot/attachments/20081211/1faefbb1/attachment.html>

More information about the coreboot mailing list