Question

我们正在使用perf top来显示CPU使用情况。结果显示了两个函数

samples    pcnt    function
------     ----    ---------
...        ...     ....
12617.00   6.8%    func_outside
 8691.00   4.7%    func_inside
.....

实际上，这两个函数是这样嵌套的，并且始终是1对1嵌套。

func_outside() {
  ....
  func_inside() 
  ... 
}

我是否应该在perf top结果中得出结论，4.7％实际上已经包含在6.8％中。如果排除func_inside的成本，func_outside的成本为2.1％（6.8-4.7）？

Answer 1

简答

报告的每个百分比都不是仅针对该特定功能。因此，func_inside

中不会计算func_outside个样本

详细信息

perf的工作方式是定期收集性能样本。默认情况下，perf top只检查当前正在运行的函数，然后将其添加到此函数的样本计数中。

我很确定是这种情况，但是想验证这是perf top显示结果的方式，所以我写了一个快速测试程序来测试它的行为。该计划有两个感兴趣的功能outer和inner。 outer函数在循环中调用inner，inner所做的工作量由参数控制。编译时一定要使用O0来避免内联。命令行参数控制两个函数之间的工作比率。

使用参数./a.out 1 1 1000000000运行会得到结果：

49.20%  a.out             [.] outer    
23.69%  a.out             [.] main    
21.32%  a.out             [.] inner

使用参数./a.out 1 10 1000000000运行会得到结果：

66.06%  a.out             [.] inner    
17.77%  a.out             [.] outer    
 9.50%  a.out             [.] main

使用参数./a.out 1 100 1000000000运行会得到结果：

88.53%  a.out             [.] inner    
 2.85%  a.out             [.] outer    
 1.09%  a.out             [.] main

如果inner中包含outer的计数，那么outer的运行时百分比将始终高于inner。但正如这些结果表明情况并非如此。

我使用的测试程序如下，并使用gcc -O0 -g --std=c11 test.c编译。

#include <stdlib.h>
#include <stdio.h>

long inner(int count) {
  long sum = 0;
  for(int i = 0; i < count; i++) {
    sum += i;
  }
  return sum;

}

long outer(int count_out, int count_in) {
  long sum = 0;
  for(int i = 0; i < count_out; i++) {
    sum += inner(count_in);
  }
  return sum;
}

int main(int argc, char **argv)  {
  if(argc < 4) {
    printf("Usage: %s <outer_cnt> <inner_cnt> <loop>\n",argv[0]);
    exit(-1);
  }

  int outer_cnt = atoi(argv[1]);
  int inner_cnt = atoi(argv[2]);
  int loops     = atoi(argv[3]);

  long res = 0;
  for(int i = 0; i < loops; i++) {
    res += outer(outer_cnt, inner_cnt);
  }

  printf("res is %ld\n", res);
  return 0;
}

关于嵌套函数的perf top结果

1 个答案:

简答

详细信息