[coreboot] MTRR setup strategy

Sun Jan 25 01:37:22 CET 2009

On 25.01.2009 00:43, Stefan Reinauer wrote:
> On 24.01.2009 23:21 Uhr, Carl-Daniel Hailfinger wrote:
>   
>> On 24.01.2009 20:58, Stefan Reinauer wrote:
>>   
>>     
>>> Carl-Daniel Hailfinger wrote:
>>>   
>>>     
>>>       
>>>> Example:
>>>> We want to cache 0MB - (2G-64M-64k).
>>>>   
>>>>     
>>>>       
>>>>         
>>> Where do the 64k come from?
>>>   
>>>     
>>>       
>> That was specific to Jason's setup. IIRC the 64k were ACPI memory or
>> somesuch.
>>   
>>     
>
> Any reason why that shouldn't be cachable?
>
> >From a memory perspective, it's just normal memory, not graphics memory
> or some such.
>
> This might even be caused by my high tables patch from recently, but it
> looks like a bug to me.
>
>   
>>>> However, a subtractive setup is not always more efficient. 
>>>>     
>>>>       
>>>>         
>>> Is it not? It sounds like at least if we have 2^x bytes of memory and
>>> subtract a small chunk or two, we would be quite well off with it.
>>>   
>>>     
>>>       
>> Assuming you don't have anything you want to cache near 4 GB (like flash):
>> Both strategies are equally efficient if the contiguous cacheable area
>> has a size of 2^n+2^(n-1).
>> The additive strategy is more efficient if the size is 2^n+2^(n-k) and k>1.
>> The subtractive strategy is more efficient if the size is 2^n-2^(n-k)
>> and k>1.
>>
>> I hope that you accept this without a detailed mathematical proof. ;-)
>>     

I should have pointed out that the bit counting algorithm at the end of
my mail is the definitive answer. The explanation above only covers some
very common special cases. Please note that the explanation above
explicitly does not cover the "2 equally sized DIMMS or only one DIMM
and no UMA" scenario because 2^n can not be zero. (If you ever encounter
DIMMs with non-power-of-2 sizes, ignore my last sentence.)

> So in a setup with 2 equally sized DIMMs or only one DIMM,

With only one DIMM or two equally sized DIMMS, one MTRR is enough
(provided you don't want a hole in there because they are >= 4 GB in
total). Feel free to call this either additive or subtractive.

> and possibly
> UMA the subtractive method will always be the way to go.
>   

If UMA is the top part of RAM and TOPMEM is 2^n, you're right for most
scenarios.
There are cases where that assumption is slighly off, though. Consider
total RAM 1024 MB, normal memory 640 MB and UMA 384 MB (possible on AMD
690G). You either need two MTRRs for additive setup or three MTRRs for
subtractive setup.

>>>> That means we
>>>> have to select the best setup type. I devised a slightly tricky
>>>> algorithm to do that:
>>>> 1. Check if there are multiple disjoint cached areas in a given
>>>> power-of-two sized area.
>>>> 1a. If no, go to step 2
>>>> 1b. If yes, stop here. Need advanced setup not described here.
>>>>   
>>>>     
>>>>       
>>>>         
>>> please describe ;)
>>>   
>>>     
>>>       
>> 1b. Take the largest contiguous power-of-2 sized natually aligned chunk.
>> Use additive setup for that chunk. Look at the remaining area. Does it
>> still contain two disjoint chunks? If yes, go to 1b. If no, go to 2.
>>
>>
>>   
>>     
>>>> 2. additive_count=bitcount(top_cached_addr+1)
>>>> 3.
>>>> subtractive_count=bitcount(rounduptonextpowerof2(top_cached_addr)-(top_cached_addr+1))
>>>> 4. if (additive_count>subtractive_count) go to subtractive_method else
>>>> go to additive_method
>>>>   
>>>>     
>>>>       
>>>>         
>>> Yes, sounds good.
>>>   
>>>     
>>>       
>> Glad to hear that. I hope the rest of the algorithm is OK for you as well.
>>   
>>     
>
> Thinking about it again I'm not sure we're ever going to see multiple
> disjoint cached areas in a given 2^x sized area. That would be... UMA in
> the middle of the memory or something? Uh.. or more than 4G of memory?
>   

It happens if you want to cache the ROM (slighly below 4 GB), but not
the IOMEM areas before the ROM. My reply to Corey should have an example.
Fortunately, we try to have no RAM between 3 GB and 4 GB, so for
effective RAM sizes >=3 GB (and no UMA) we need 2 MTRRs below 4 GB for
RAM (and 1 MTRR for ROM) and 1 MTRR above 4 GB if you're willing to
waste address space (not RAM) to save on MTRRs.

The bit counting algorithm will solve these problems in an optimal way.
If you want the "waste address space to save MTRRs" optimization, the
bit count algorithm can haldle it. Just change the input to round up
top_cached_addr+1 to a multiple of the biggest possible power of 2 which
is in a "don't care" area.

Of course, if you are really really trying to clamp down on MTRR usage,
you can do MTRR setup in two steps: cached ROM during coreboot execution
and a change to uncached ROM (and thus avoidance of disjoint cached
areas) directly before passing execution to the payload.

(And anyone implementing this should probably add all mails explaining
the algorithm as code comments. Nobody is going to understand such code
in a few months if it was written without enough comments. ;-)

Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/