我正在尝试使用perf和ocperf在我的代码中建立瓶颈。 如果我做一个详细的统计数据'在我的二进制文件上运行,两个统计信息以红色文本报告,我想这意味着它太高了。
L1-dcache-load-misss为红色,为28.60%
iTLB-load-miss为红色,为425.89%
# ~bram/src/pmu-tools/ocperf.py stat -d -d -d -d -d ./bench ray
perf stat -d -d -d -d -d ./bench ray
Loaded 455 primitives.
Testing ray against 455 primitives.
Performance counter stats for './bench ray':
9031.444612 task-clock (msec) # 1.000 CPUs utilized
15 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
292 page-faults # 0.032 K/sec
28,786,063,163 cycles # 3.187 GHz (61.47%)
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
55,742,952,563 instructions # 1.94 insns per cycle (69.18%)
3,717,242,560 branches # 411.589 M/sec (69.18%)
18,097,580 branch-misses # 0.49% of all branches (69.18%)
10,230,376,136 L1-dcache-loads # 1132.751 M/sec (69.17%)
2,926,349,754 L1-dcache-load-misses # 28.60% of all L1-dcache hits (69.21%)
145,843,523 LLC-loads # 16.148 M/sec (69.32%)
49,512 LLC-load-misses # 0.07% of all LL-cache hits (69.33%)
<not supported> L1-icache-loads
260,144 L1-icache-load-misses # 0.029 M/sec (69.34%)
10,230,376,830 dTLB-loads # 1132.751 M/sec (69.34%)
1,197 dTLB-load-misses # 0.00% of all dTLB cache hits (61.59%)
2,294 iTLB-loads # 0.254 K/sec (61.55%)
9,770 iTLB-load-misses # 425.89% of all iTLB cache hits (61.51%)
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses
9.032234014 seconds time elapsed
我的问题:
另外,我的机器有一个Haswell CPU。我原本预计会把停滞周期的数据包括在内吗?