[coreboot] AMD Family 0Fh CAR and L1 cache tags

Carl-Daniel Hailfinger c-d.hailfinger.devel.2006 at gmx.net
Wed Jan 16 14:03:54 CET 2008

On 15.01.2008 18:54, Marc Jones wrote:
> Carl-Daniel Hailfinger wrote:
>> The BKDG rev. 3.08 for AMD Family 0Fh states that it is possible to use
>> a CAR area with a size of 64K in section 13.16 "Cache Initialization For
>> General Storage During Boot". It also says that during DRAM training CAR
>> size must be reduced. For DDR training, 256 cache lines with L1 cache
>> tag indexes 00h-FFh are reserved and must not be used as CAR. The text
>> then refers to the AMD64 Arch Programmers Manual Vol. 2 for more details
>> on L1 function. However, I couldn't find any explanation why L1 cache
>> tag indexes 00h-FFh correspond to address space C0000h-C3FFFh when fixed
>> size MTRRs are active.
> I may be misunderstanding your question but I don't think that tag
> indexes 00h-ffh have to correspond to C0000h-C3FFFh. I'm also not
> positive that they must be tag indexes 00h-ffh. I think that they
> could be on the end as long as the tags are contiguous.

Good to know. Can you make sure such a sentence gets added to the BKDG
in its various versions?

> This comment refers DDR training needing the space to hold test
> patterns for dqs eye finding during memory training. See
> northbridge\amd\amdk8\raminit_f_dqs.c TrainDQSRdWrPos().

Thanks. It seems I have to reread the code a few times to fully
understand its structure.
But I have spotted something peculiar in the code of TrainDQSRdWrPos()
in src/northbridge/amd/amdk8/raminit_f_dqs.c

Errors = 0;
channel = 0;
while( (channel<2) && (!Errors)) {
	print_debug_dqs("\tTrainDQSRdWrPos: 1 channel ",channel, 1); 
	for(DQSWrDelay = 0; DQSWrDelay < 48; DQSWrDelay++) {
		unsigned err;
		SetDQSDelayAllCSR(ctrl, channel, DQS_WRITEDIR, DQSWrDelay);
		print_debug_dqs("\t\tTrainDQSRdWrPos: 21 DQSWrDelay ", DQSWrDelay, 2); 
		err= TrainReadDQS(ctrl, channel, pattern, buf_a, dqs_delay_a, sysinfo);
		print_debug_dqs("\t\tTrainDQSRdWrPos: 22 err ",err, 2); 
		if(err == 0) break;
-------------> Now we set "Errors"
		Errors |= err;
	print_debug_dqs("\tTrainDQSRdWrPos: 3 DQSWrDelay ", DQSWrDelay, 1); 
	if(DQSWrDelay < 48) {
-------------> Now we overwrite "Errors" in case the for loop above ever had err == 0.
		Errors = TrainWriteDQS(ctrl, channel, pattern, buf_a, dqs_delay_a, sysinfo);
		print_debug_dqs("\tTrainDQSRdWrPos: 4 Errors ", Errors, 1); 
		//FIXME: 64MuxMode??	
		channel++; // skip channel if 64-bit mode

As I understand the logic of the snippet above, we look for a DQSWrDelay
which does not give any errors with TrainReadDQS. Then we don't care
about errors for other values of DQSWrDelay and use the current value of
DQSWrDelay to run TrainWriteDQS.
If TrainReadDQS failed for all values of DQSWrDelay, we return the
bitwise OR of all error conditions we had for all values of DQSWrDelay.
Does that really make sense?

> For coreboot, it looks like the test patterns are just pushed onto the
> stack.

Indeed. So we are completely free to place CAR anywhere we want with any
size we want (subject to L2 size restrictions).

> For AMD BIOS code, this is not the case and they are put into the
> cache at a set location. (I think that this is easier for the AGESA
> asm code to handle that way).

I see.

Thanks for pointing me to the code. I shall add good comments to that
code snippet once I have more time.


More information about the coreboot mailing list