L3缓存未命中的PMC(性能监视计数器)值太高

时间:2018-03-10 10:25:22

标签: x86 cpu-architecture performancecounter msr intel-pmu

我正在寻找一种方法来估算在我的带有Intel CPU(Intel i7 6700 skylake)的Linux PC上使用'IA32_PERFEVTSELx'和'IA32_PMCx'MSR对的L3缓存未命中数。 为此,我在内核中安装了一个计时器,它定期报告PMC的值(1秒)。 在代码中,我在写入“0x41412E”之后读取IA32_PMC1 MSR的值(映射到0xC2),其中EVENT选择部分是0x2E,UMask部分是0x41,第16位是用户,第22位是相对于IA32_PERFEVTSEL1的使能位MSR(映射到0x187):

uint64_t val = 0x41412E; // UMask:0x41 + EVENT Select:0x2E + User bit + Enable bit
uint64_t ret = 0x0;

rdmsr_safe(0x187, ret); // 0x187 is mapped address of PERFEVTSEL1 MSR
if ( ret != 0x41412E ) {
    if ( wrmsr_safe(0x187, val) ) {
        TEMP_DEBUG("failed to write msr!!!");
    }
}

if ( rdmsr_safe(0xC2, ret) ) { // 0xC2 is mapped address of PMC1 MSR
    TEMP_DEBUG("failed to read msr!!!");
} else {
    TEMP_DEBUG("rdmsr: %lu", ret);
}

即使我预计该值代表L3缓存未命中的数量,但它似乎很奇怪。它的值太高了,我想这不是L3缓存未命中的数量,我在手册中找不到它的含义(英特尔®64和IA-32架构软件开发人员手册卷3B:系统编程指南) )。我观察到的值如下:

rdmsr: 0 at start_shscan(56) in mcsched.c
rdmsr: 0 at start_shscan(56) in mcsched.c
rdmsr: 8595908 at start_shscan(56) in mcsched.c
rdmsr: 17274482 at start_shscan(56) in mcsched.c
rdmsr: 21449216 at start_shscan(56) in mcsched.c
rdmsr: 26305745 at start_shscan(56) in mcsched.c
rdmsr: 26511242 at start_shscan(56) in mcsched.c
rdmsr: 33316291 at start_shscan(56) in mcsched.c
rdmsr: 34736360 at start_shscan(56) in mcsched.c
rdmsr: 35151932 at start_shscan(56) in mcsched.c
rdmsr: 43806356 at start_shscan(56) in mcsched.c
rdmsr: 51132302 at start_shscan(56) in mcsched.c
rdmsr: 59797757 at start_shscan(56) in mcsched.c
rdmsr: 0 at start_shscan(56) in mcsched.c
rdmsr: 0 at start_shscan(56) in mcsched.c
rdmsr: 6820029 at start_shscan(56) in mcsched.c
rdmsr: 8322078 at start_shscan(56) in mcsched.c
rdmsr: 63313471 at start_shscan(56) in mcsched.c
rdmsr: 397962 at start_shscan(56) in mcsched.c
rdmsr: 9429026 at start_shscan(56) in mcsched.c
rdmsr: 18124455 at start_shscan(56) in mcsched.c
rdmsr: 23706367 at start_shscan(56) in mcsched.c
rdmsr: 27087960 at start_shscan(56) in mcsched.c
rdmsr: 68769660 at start_shscan(56) in mcsched.c
rdmsr: 69110424 at start_shscan(56) in mcsched.c
rdmsr: 78216541 at start_shscan(56) in mcsched.c
rdmsr: 87385467 at start_shscan(56) in mcsched.c
rdmsr: 95083478 at start_shscan(56) in mcsched.c
rdmsr: 101347654 at start_shscan(56) in mcsched.c
rdmsr: 8327692 at start_shscan(56) in mcsched.c
rdmsr: 27377092 at start_shscan(56) in mcsched.c
rdmsr: 36316258 at start_shscan(56) in mcsched.c
rdmsr: 45323291 at start_shscan(56) in mcsched.c
rdmsr: 54366010 at start_shscan(56) in mcsched.c
rdmsr: 63135801 at start_shscan(56) in mcsched.c
rdmsr: 72037000 at start_shscan(56) in mcsched.c
rdmsr: 81032798 at start_shscan(56) in mcsched.c
rdmsr: 89975340 at start_shscan(56) in mcsched.c
rdmsr: 98661287 at start_shscan(56) in mcsched.c
rdmsr: 107482921 at start_shscan(56) in mcsched.c
rdmsr: 116290561 at start_shscan(56) in mcsched.c
rdmsr: 125135979 at start_shscan(56) in mcsched.c
rdmsr: 133920103 at start_shscan(56) in mcsched.c
rdmsr: 142695638 at start_shscan(56) in mcsched.c
rdmsr: 151456156 at start_shscan(56) in mcsched.c
rdmsr: 160171239 at start_shscan(56) in mcsched.c
rdmsr: 168879495 at start_shscan(56) in mcsched.c
rdmsr: 177788861 at start_shscan(56) in mcsched.c
rdmsr: 186589920 at start_shscan(56) in mcsched.c
rdmsr: 195331675 at start_shscan(56) in mcsched.c
rdmsr: 204166715 at start_shscan(56) in mcsched.c
rdmsr: 213045449 at start_shscan(56) in mcsched.c
rdmsr: 221942627 at start_shscan(56) in mcsched.c
rdmsr: 231073520 at start_shscan(56) in mcsched.c

我在代码中是否有任何错误?或者请给我一些关于价值观的建议。

=======================添加以下内容===================== =====

@Peter Cordes,我提到了英特尔手册(英特尔®64和IA-32架构软件开发人员手册卷3B:系统编程指南),我打算使用“LLC未命中”,这是预定义的架构性能之一下表中的事件:

Table 18-1. UMask and Event Select Encodings for Pre-Defined Architectural Performance Events in the Intel manual

我认为在perf中给出一个示例更能帮助您理解:我可以在perf中使用“perf stat -e r412e ls”来估计“ls”命令期间的L3缓存未命中。 “r412e”可以分为'r'+'41'+'2e',r代表'[原始硬件evnet事件描述符',41是UMask(0x41),2e是事件选择(0x2e)。你可以通过'perf list'看到它。

0 个答案:

没有答案