[coreboot] [PATCH] use memset

Myles Watson mylesgw at gmail.com
Thu Mar 18 14:28:11 CET 2010


> Myles Watson <mylesgw at gmail.com> writes:
> > I was having trouble with stack corruption.  Using memset (C) instead of
> > clear_memory(asm) speeds it up by almost a factor of 2 for a 1M region.
> >
> > TSC difference with clear_memory 0xFA884D
> > TSC difference with memset 0x826742
> 
> That's odd.  I just recently sent a patch to the list ("ulzma delay")
> that did pretty much the opposite, as I was seeing really bad
> performance for the C memset function on my Opteron (Istanbul) boxes.
> memset would take minutes to do what ran in a handful of ms using "rep
> stosb", by all accounts because of instruction cache thrashing.
> 
> I see clear_memory was using "stosl", but apart from that it looks
> very similar to the variant I ended up with to improve performance.
Once caching works correctly for fam10, maybe you'll see similar performance
numbers?

> Could you see if you experience stack corruption with the "rep stosb"
> patch I posted for memset as well?  
It's hard to tell if you experience stack corruption... unless it bites you.
There are a lot of places on the stack where it won't matter if it gets
corrupted.  I don't have a good way to test that.

> I'd like to see that go in, but of
> course it's a problem if it results in a performance degradation on
> other platforms. Perhaps we could enable it only for the platforms where
> instruction footprint/fetches is known to be an issue, ie fam10?
The best thing would be to fix caching on fam10.  Of course if that's not
feasible for some reason, then adding asm just for that architecture could
be the way to go.  In general, the more we can keep it straight C, the
better for me.

Thanks,
Myles





More information about the coreboot mailing list