Question

我想测量我的C ++代码的某些部分的L1，L2和L3缓存命中/未命中率。我对整个应用程序使用Perf不感兴趣。 Perf可以用作C ++中的库吗？

int main() {
    ...
    ...
    start_profiling()
    // The part I'm interested in
    ...
    end_profiling()
    ...
    ...
}

我给了英特尔PCM一个镜头，但我有两个问题。首先，它gave me some strange numbers。其次，它不支持L1 Cache分析。

如果使用Perf无法实现，获取该信息的最简单方法是什么？

Answer 1

听起来你正在尝试做的就是阅读几个性能计数器，PAPI库非常适合。

Example.

full list of supported counters很长，但听起来您对PAPI_L1_TCM，PAPI_L1_TCA及其L2和L3最感兴趣同行。请注意，您还可以将访问分解为读/写，并且可以区分指令和数据缓存。

Answer 2

是的，有一个特殊的每线程监控，允许从用户空间内读取perf计数器。请参阅perf_event_open(2)

的手册页

由于perf仅支持L1i，L1d和最后一级缓存事件，因此您需要使用PERF_EVENT_RAW模式并将手动数字用于CPU。

要实施分析，您需要设置sample_interval，poll / select fd或等待SIGIO信号，当它发生时，请阅读来自它的样本和指令指针。您可能会尝试使用像GDB这样的调试器来解析返回到函数名的指针指针。

另一种选择是使用SystemTap。您需要空start|end_profiling()的实现，只是为了启用SystemTap性能分析：

global traceme, prof;

probe process("/path/to/your/executable").function("start_profiling") {
    traceme = 1;
}

probe process("/path/to/your/executable").function("end_profiling") {
    traceme = 0;
}

probe perf.type(4).config(/* RAW value of perf event */).sample(10000) {
    prof[usymname(uaddr())] <<< 1;
}

probe end {
    foreach([sym+] in prof) {
        printf("%16s %d\n", sym, @count(prof[sym]));
    }
}

是否可以在C ++代码中使用Linux Perf profiler？

2 个答案: