Question

当我想使用Linux工具套件perf中的perf-stat和perf-report生成性能报告时，我运行：

$ perf record -o my.perf.data myCmd
$ perf report -i my.perf.data

和

$ perf stat myCmd

但这意味着我第二次运行'myCmd'，这需要几分钟。相反，我希望：

$ perf stat -i my.perf.data

但与perf套件中的大多数工具不同，我没有看到perf-stat的-i选项。是否有另一种工具，或者获得perf-report以生成与perf-stat相似的输出的方法？

Answer 1

我在kernel.org上挖掘了源代码，看起来没有办法让perf stat解析perf.data

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=tools/perf/builtin-stat.c;h=c70d72003557f17f29345b0f219dc5ca9f572d75;hb=refs/heads/linux-2.6.33.y

如果你看第245行，你会看到函数“run_perf_stat”，308-320周围的线似乎是实际记录和整理的内容。

我没有深入研究这一点，以确定是否可以启用您想要的功能。

它看起来不像perf报告有很多额外的格式化功能。如果您愿意，可以进一步查看：

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=tools/perf/builtin-report.c;h=860f1eeeea7dbf8e43779308eaaffb1dbcf79d10;hb=refs/heads/linux-2.6.33.y

Answer 2

perf stat不能用于解析perf.data文件，但是您可以要求perf report打印带有perf report --header |egrep Event\|Samples的带有事件计数估计的标题。仅记录在perf.data文件中的事件。

perf stat在计数模式下使用hardware performance monitoring unit，而perf record / perf report与perf.data文件使用在周期性溢出模式下配置的相同硬件单元（{{3 }}。在这两种模式下，sampling profiling都将其控制寄存器设置为一组性能事件（例如，CPU周期或执行的指令），并且每次事件都会由硬件增加计数器。

在计数模式perf stat中，计数器使用程序启动时初始设置为零的计数器，由硬件对其进行递增，并且perf将在程序退出时读取最终的计数器值（实际上，计数将由具有类似最终值的OS分成多个段结果-完整程序运行的单个值）。

在分析模式下，perf record会将每个硬件计数器设置为某个负值，例如-200000，并且将注册并启用溢出处理程序（OS内核会将实际值自动调整为某个频率）。每计数200000个事件，计数器将从-1溢出到零，并产生溢出中断。 perf_events中断处理程序会将“样本”（当前时间，pid，指令指针，以及可选的-g模式下的调用栈）记录到环形缓冲区（由perf映射）中，数据将从中保存到{{ 1}}。该处理程序还将再次将计数器重置为perf.data。因此，运行足够长的时间后，-200000中将存储许多样本。该样本集可用于生成程序的统计资料（程序的哪些部分的运行频率更高）。但是，如果每个样本每200000个事件生成一次，我们也可以获得一些总事件的估计值。由于内核会自动进行值调整（它会尝试以4000 Hz的频率生成样本），因此估计会更加困难，请使用perf.data之类的功能来禁用样本周期的自动调整。

-c 1000000在默认模式下显示什么？对于某些x86_64 cpu，我有：程序的运行时间（任务时钟和运行时间），3个软件事件（上下文切换，cpu迁移，页面错误），4个硬件计数器：循环，指令，分支，分支未命中：

perf stat

在默认模式下记录$ echo '3^123456%3' | perf stat bc 0 Performance counter stats for 'bc': 325.604672 task-clock (msec) # 0.998 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 181 page-faults # 0.556 K/sec 828,234,675 cycles # 2.544 GHz 1,840,146,399 instructions # 2.22 insn per cycle 348,965,282 branches # 1071.745 M/sec 15,385,371 branch-misses # 4.41% of all branches 0.326152702 seconds time elapsed是什么？当硬件事件可用时，则为循环事件。在单次唤醒（环形缓冲区溢出）中，perf确实将1246个样本保存到perf.data

perf record

使用$ echo '3^123456%3' | perf record bc [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.049 MB perf.data (1293 samples) ]，perf report --header|less和perf script，您可以了解一下性能数据的内容：

perf script -D

perf.data中有一些时间戳，还有一些程序启动和退出的其他事件（$ perf report --header |grep event # event : name = cycles:uppp, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD ... # Samples: 1K of event 'cycles:uppp' $ perf script 2>/dev/null |grep cycles|wc -l 1293），但是默认perf script -D |egrep exec\|EXIT中没有足够的信息来完全重建perf.data的输出。运行时间仅记录为开始和退出的时间戳记，并且在每个事件示例中，均不记录软件事件，并且仅使用单个硬件事件（周期；但不包含指令，分支，分支未命中）。可以对用过的硬件计数器进行近似估算，但并不精确（实际周期约为820-825百万）：

perf stat

使用$ perf report --header |grep Event # Event count (approx.): 836622729的非默认记录，perf.data可以估计更多事件：

perf report

可以使用固定时间段，但是如果$ echo '3^123456%3' | perf record -e cycles,instructions,branches,branch-misses bc [ perf record: Captured and wrote 0.238 MB perf.data (5164 samples) ] $ perf report --header |egrep Event\|Samples # Samples: 1K of event 'cycles' # Event count (approx.): 834809036 # Samples: 1K of event 'instructions' # Event count (approx.): 1834083643 # Samples: 1K of event 'branches' # Event count (approx.): 347750459 # Samples: 1K of event 'branch-misses' # Event count (approx.): 15382047选项的值太低（每秒生成的样本不应超过1000-4000次），内核可能会限制某些事件：

-c

可以从perf.data文件生成perf-stat结果吗？

2 个答案: