AMD 15h的性能统计数据

时间:2015-06-06 10:28:38

标签: linux performance amd perf

根据AMD 15h的第BKDG页(第588页),可能会禁用 硬件预取器通过设置MSRC001_1022的某些位

MSRC001_1022 Data Cache Configuration (DC_CFG)
Bits    -->  Description
63:16   -->  Reserved.
15      -->  DisPfHwForSw. Read-write. Reset: 0. 1=Disable hardware prefetches for software prefetches.
14      -->  Reserved.
13      -->  DisHwPf. Read-write. Reset: 0. 1=Disable the DC hardware prefetcher. 
12:10   -->  Reserved.
9:5     -->  Reserved.
4       -->  DisSpecTlbRld. Read-write. Reset: 0. 1=Disable speculative TLB reloads. 
3:0     -->  Reserved.

为了禁用所有预取配置,我必须写入0xA008 那个MSR。我使用

为所有32个内核做了这个
[root <at> tiger exe]# wrmsr -a 0xc0011022 0xA008
[root <at> tiger exe]# rdmsr -a -x -0 0xc0011022
000000000000a008
...

但是,当我和命令一起运行perf时,预取统计数据 非零!

[root <at> tiger exe]# perf stat -e
L1-dcache-loads:uk,L1-dcache-prefetches:uk,L1-dcache-prefetch-misses:uk ./bzip2_base.amd64-m64-gcc44-nn
spec_init
Tested 64MB buffer: OK!
 Performance counter stats for './bzip2_base.amd64-m64-gcc44-nn':
    55,341,597,193 L1-dcache-loads:uk
     1,047,662,614 L1-dcache-prefetches:uk
                 0 L1-dcache-prefetch-misses:uk
      35.921618464 seconds time elapsed

我希望在L1-dcache-prefetches前看到0。不是吗?

如何调试计数器以了解它们如何映射到MSR?

1 个答案:

答案 0 :(得分:0)

hw计数器的合成perf名称的映射(由perf list列出)在perf_events子系统的内核源代码中为许多CPU定义。对于amd,他们在arch/x86/events/amd/core.c文件中。在4.8版本的内核和amd cpu缓存事件中映射到cpu特定的常量,以便写入PMC MSR:

http://elixir.free-electrons.com/linux/v4.8/source/arch/x86/events/amd/core.c

static __initconst const u64 amd_hw_cache_event_ids
 ... =  {
 [ C(L1D) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x0040, /* Data Cache Accesses        */
        [ C(RESULT_MISS)   ] = 0x0141, /* Data Cache Misses          */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0,
        [ C(RESULT_MISS)   ] = 0,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0x0267, /* Data Prefetcher :attempts  */
        [ C(RESULT_MISS)   ] = 0x0167, /* Data Prefetcher :cancelled */
    },
 },
 [ C(L1I ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x0080, /* Instruction cache fetches  */
        [ C(RESULT_MISS)   ] = 0x0081, /* Instruction cache misses   */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = -1,
        [ C(RESULT_MISS)   ] = -1,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0x014B, /* Prefetch Instructions :Load */
        [ C(RESULT_MISS)   ] = 0,
    },
 },
 [ C(LL  ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x037D, /* Requests to L2 Cache :IC+DC */
        [ C(RESULT_MISS)   ] = 0x037E, /* L2 Cache Misses : IC+DC     */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0x017F, /* L2 Fill/Writeback           */
        [ C(RESULT_MISS)   ] = 0,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0,
        [ C(RESULT_MISS)   ] = 0,
    },
 },

...
__init int amd_pmu_init(void)
{ ...
    /* Performance-monitoring supported from K7 and later: */
    if (boot_cpu_data.x86 < 6)
        return -ENODEV;

    x86_pmu = amd_pmu;

    ret = amd_core_pmu_init();
    ...

    /* Events are common for all AMDs */
    memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
           sizeof(hw_cache_event_ids));
    return 0;
}