[coreboot] rfc - gcc builtins and memset memcpy memmove memcmp
scott at notabs.org
Sat Sep 11 17:34:03 CEST 2010
]From: coreboot-bounces at coreboot.org [mailto:coreboot-bounces at coreboot.org] On Behalf Of Arne Georg Gleditsch
]Sent: Saturday, September 11, 2010 06:01 AM
]To: Scott Duplichan
]Cc: 'Marc Jones'; 'Carl-Daniel Hailfinger'; 'Coreboot'
]Subject: Re: [coreboot] rfc - gcc builtins and memset memcpy memmove memcmp
]"Scott Duplichan" <scott at notabs.org> writes:
]> In this report:
]> Arne may have been encountering the ClLinesToNbDis issue
]> (assuming the memset code was running from flash). Switching
]> to rep movs would greatly improve performance because unlike
]> a byte loop, rep movs loops in microcode which does not cause
]> continuous flash memory accesses.
]This was my assumption as well. After fixing the ClLinesToNbDis
]setting, I have removed the rep stosb code from my tree, and so far I've
]not observed the pathological memset behaviour that caused me to put it
]in in the first place. (As mentioned earlier this was never altogether
]deterministic, I'm assuming some critical part of the original memset
]loop needed to straddle cache lines or something for it to manifest.)
Interesting point about memcpy straddling a cache line boundary. It got
me thinking about what the DediProg em100 trace function shows when
booting from SPI flash. With SPI, the SB initially reads a dword at a
time. If the processor is not caching code, a byte loop memcpy would
trigger multiple dword reads from the flash chip for every byte copied.
If BIOS sets SB option PrefetchEnSPIFromHost, then the SB will switch
to cache line reads, and cache the last line read. Since a byte loop
memcpy fits in a cache line, it seems conceivable that memcpy performance
would be good unless the function straddles a cache line boundary. I am
not sure what the situation is with LPC flash.
Anyway, I noticed coreboot is not setting the AMD SB bit PrefetchEnSPIFromHost.
For big payloads, setting this bit could cut boot time by eliminating
overhead when reading big chunks from SPI flash memory.
More information about the coreboot