Question

我有一个带有Linux的Raspberry Pi 3。 RPi3有一个四核皮质-A53，带有一个性能监测单元（PMU）v3。我执行循环测试程序来做一些实时测试。 Cyclcitest是一个应用程序，您可以在其中设置周期和迭代次数并计算延迟。因此，它执行一些执行并在它进入睡眠状态之后直到nex期间，系统将其唤醒。

我想在每次循环测试执行中读取缓存内存值，看看它在执行时有多少缓存未命中（我不想在任务处于休眠状态时丢失）。

我尝试过执行perf stat：

perf stat -o result1.txt -r 10 -i -e armv8_pmuv3/l1d_cache/ 
    -e armv8_pmuv3/l1d_cache_refill/ 
    -e armv8_pmuv3/l1d_cache_wb/ 
    -e armv8_pmuv3/l1d_tlb_refill/ 
    -e armv8_pmuv3/l1i_cache/ 
    -e armv8_pmuv3/l1i_cache_refill/ 
    -e armv8_pmuv3/l1i_tlb_refill/ 
    -e armv8_pmuv3/l2d_cache/ 
    -e armv8_pmuv3/l2d_cache_refill/ 
    -e armv8_pmuv3/l2d_cache_wb/ 
    -e armv8_pmuv3/mem_access/ 
    cyclictest -l57600 -m -n -t1 -p80 -i50000 -h300 -q --histfile=666_data_50

但是，它确实提供了50％执行的信息：

Performance counter stats for 'cyclictest -l57600 -m -n -t1 -p80 -i50000 -h300 -q --histfile=666_data_50' (10 runs):

     937729229      armv8_pmuv3/l1d_cache/                                        ( +-  2.41% )  (54.50%)
      44736600      armv8_pmuv3/l1d_cache_refill/                                     ( +-  2.33% )  (54.39%)
      44784430      armv8_pmuv3/l1d_cache_wb/                                     ( +-  2.11% )  (54.33%)
        294033      armv8_pmuv3/l1d_tlb_refill/                                     ( +- 13.82% )  (54.21%)
    1924752301      armv8_pmuv3/l1i_cache/                                        ( +-  2.37% )  (54.41%)
     120581610      armv8_pmuv3/l1i_cache_refill/                                     ( +-  2.41% )  (54.46%)
        761651      armv8_pmuv3/l1i_tlb_refill/                                     ( +-  4.87% )  (54.70%)
     215103404      armv8_pmuv3/l2d_cache/                                        ( +-  2.28% )  (54.69%)
      30884575      armv8_pmuv3/l2d_cache_refill/                                     ( +-  1.44% )  (54.83%)
      11424917      armv8_pmuv3/l2d_cache_wb/                                     ( +-  2.03% )  (54.76%)
     943041718      armv8_pmuv3/mem_access/                                       ( +-  2.41% )  (54.74%)

2904.940283006 seconds time elapsed                                          ( +-  0.07% )

我不知道这个计数器在运行时是仅计算此任务的缓存信息，还是在睡眠时计数。有人知道吗？我还运行其他应用程序，他们是否可以像我在perf stat中指定的那样修改这些计数器的值？

如果无法读取该任务所运行的计数器的确切值？使用模块或自定义用户空间应用程序？

谢谢！

Answer 1

每个性能监视器硬件都受到通道数量的限制：每个时刻可以同时计算多少事件。例如，许多现代x86 / x86_64可能有4个灵活的通道用于每个cpu核心和3个固定通道。当您向分析器询问更多事件时，它将进行多路复用（如VTune和PAPI那样）。当多路复用处于活动状态时，某些事件e1被测量了55％的运行时间和perf stat（但不是perf record？）will extrapolate counts into full running time（“C。Multiplexing”）。这种推断可能会有一些错误。

您的皮质-A53与PMU v3只有六个通道：http://infocenter.arm.com/help/topic/com.arm.doc.ddi0500d/BIIDADBH.html PMEVCNTR0_EL0 - PMEVCNTR5_EL0和PMEVTYPER0_EL0 - PMEVTYPER5_EL0。尝试使用不超过6个事件来启动perf stat，以便单次运行测试以关闭事件多路复用：

perf stat -o result1.txt -r 10 -i \
    -e armv8_pmuv3/l1d_cache/  \
    -e armv8_pmuv3/l1d_cache_refill/  \
    -e armv8_pmuv3/l1d_cache_wb/  \
    -e armv8_pmuv3/l1d_tlb_refill/  \
    -e armv8_pmuv3/l1i_cache/  \
    -e armv8_pmuv3/l1i_cache_refill/  \
    cyclictest -l57600 -m -n -t1 -p80 -i50000 -h300 -q --histfile=666_data_50

perf stat -o result2.txt -r 10 -i \
    -e armv8_pmuv3/l1i_tlb_refill/  \
    -e armv8_pmuv3/l2d_cache/  \
    -e armv8_pmuv3/l2d_cache_refill/  \
    -e armv8_pmuv3/l2d_cache_wb/  \
    -e armv8_pmuv3/mem_access/  \
    cyclictest -l57600 -m -n -t1 -p80 -i50000 -h300 -q --histfile=666_data_50

您也可以尝试将事件分组：-e \{event1,event2...,event6\}（https://stackoverflow.com/a/48448876）和设置将与其他集合多路复用。

使用Linux读取ARMv8中的PMU计数器

1 个答案: