根据AMD 15h的第BKDG页(第588页),可能会禁用 硬件预取器通过设置MSRC001_1022的某些位
MSRC001_1022 Data Cache Configuration (DC_CFG)
Bits --> Description
63:16 --> Reserved.
15 --> DisPfHwForSw. Read-write. Reset: 0. 1=Disable hardware prefetches for software prefetches.
14 --> Reserved.
13 --> DisHwPf. Read-write. Reset: 0. 1=Disable the DC hardware prefetcher.
12:10 --> Reserved.
9:5 --> Reserved.
4 --> DisSpecTlbRld. Read-write. Reset: 0. 1=Disable speculative TLB reloads.
3:0 --> Reserved.
为了禁用所有预取配置,我必须写入0xA008 那个MSR。我使用
为所有32个内核做了这个[root <at> tiger exe]# wrmsr -a 0xc0011022 0xA008
[root <at> tiger exe]# rdmsr -a -x -0 0xc0011022
000000000000a008
...
但是,当我和命令一起运行perf时,预取统计数据 非零!
[root <at> tiger exe]# perf stat -e
L1-dcache-loads:uk,L1-dcache-prefetches:uk,L1-dcache-prefetch-misses:uk ./bzip2_base.amd64-m64-gcc44-nn
spec_init
Tested 64MB buffer: OK!
Performance counter stats for './bzip2_base.amd64-m64-gcc44-nn':
55,341,597,193 L1-dcache-loads:uk
1,047,662,614 L1-dcache-prefetches:uk
0 L1-dcache-prefetch-misses:uk
35.921618464 seconds time elapsed
我希望在L1-dcache-prefetches前看到0。不是吗?
如何调试计数器以了解它们如何映射到MSR?
答案 0 :(得分:0)
hw计数器的合成perf名称的映射(由perf list
列出)在perf_events
子系统的内核源代码中为许多CPU定义。对于amd,他们在arch/x86/events/amd/core.c
文件中。在4.8版本的内核和amd cpu缓存事件中映射到cpu特定的常量,以便写入PMC MSR:
http://elixir.free-electrons.com/linux/v4.8/source/arch/x86/events/amd/core.c
static __initconst const u64 amd_hw_cache_event_ids
... = {
[ C(L1D) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x0040, /* Data Cache Accesses */
[ C(RESULT_MISS) ] = 0x0141, /* Data Cache Misses */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0,
[ C(RESULT_MISS) ] = 0,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0267, /* Data Prefetcher :attempts */
[ C(RESULT_MISS) ] = 0x0167, /* Data Prefetcher :cancelled */
},
},
[ C(L1I ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x0080, /* Instruction cache fetches */
[ C(RESULT_MISS) ] = 0x0081, /* Instruction cache misses */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x014B, /* Prefetch Instructions :Load */
[ C(RESULT_MISS) ] = 0,
},
},
[ C(LL ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x037D, /* Requests to L2 Cache :IC+DC */
[ C(RESULT_MISS) ] = 0x037E, /* L2 Cache Misses : IC+DC */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x017F, /* L2 Fill/Writeback */
[ C(RESULT_MISS) ] = 0,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0,
[ C(RESULT_MISS) ] = 0,
},
},
...
__init int amd_pmu_init(void)
{ ...
/* Performance-monitoring supported from K7 and later: */
if (boot_cpu_data.x86 < 6)
return -ENODEV;
x86_pmu = amd_pmu;
ret = amd_core_pmu_init();
...
/* Events are common for all AMDs */
memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
sizeof(hw_cache_event_ids));
return 0;
}