如何跟踪修复过程的唤醒延迟?

时间:2020-03-18 02:59:03

标签: linux-kernel perf ebpf ftrace

我想使用ftrace跟踪修复过程的唤醒延迟。 但是,ftrace将仅记录最大延迟。 而且,set_ftrace_pid没用。

有人知道怎么做吗?

非常感谢您。

1 个答案:

答案 0 :(得分:2)

除了@Qeole添加的建议之外,您还可以使用perf sched实用程序来获取有关进程唤醒延迟的详细记录。尽管runqlat之类的ebpf工具可以为您提供更高层次的概述,但使用perf sched可以帮助您捕获所有调度程序事件,从而以更多的方式观察和检查进程的唤醒延迟详情。请注意,运行perf sched来监视长时间运行的计算密集型过程会带来自身的开销问题。

您首先需要运行perf sched record-

在手册页中

'perf sched record <command>' records the scheduling events of an arbitrary workload.

例如说您想跟踪命令ls的唤醒延迟。

sudo perf sched record ls

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.453 MB perf.data (562 samples) ]

您将看到在运行命令的同一目录中,将生成一个perf.data文件。该文件将包含所有原始调度程序事件,下面的命令将有助于理解所有这些调度程序事件。

  • 您可以运行perf sched latency来获取每个任务的延迟摘要,包括每个任务的上下文切换次数,平均和最大延迟的详细信息。
sudo perf sched latency

 -----------------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
 -----------------------------------------------------------------------------------------------------------------
  migration/4:35        |      0.000 ms |        1 | avg:    0.003 ms | max:    0.003 ms | max at: 231259.727951 s
  kworker/u16:0-p:6962  |      0.103 ms |       20 | avg:    0.003 ms | max:    0.035 ms | max at: 231259.729314 s
  ls:7118               |      1.752 ms |        1 | avg:    0.003 ms | max:    0.003 ms | max at: 231259.727898 s
  alsa-sink-Gener:3133  |      0.000 ms |        1 | avg:    0.003 ms | max:    0.003 ms | max at: 231259.729321 s
  Timer:5229            |      0.035 ms |        1 | avg:    0.002 ms | max:    0.002 ms | max at: 231259.729625 s
  AudioIP~ent RPC:7597  |      0.040 ms |        1 | avg:    0.002 ms | max:    0.002 ms | max at: 231259.729698 s
  MediaTimer #1:7075    |      0.025 ms |        1 | avg:    0.002 ms | max:    0.002 ms | max at: 231259.729651 s
  gnome-terminal-:4989  |      0.254 ms |       24 | avg:    0.001 ms | max:    0.003 ms | max at: 231259.729358 s
  MediaPl~back #3:7098  |      0.034 ms |        1 | avg:    0.001 ms | max:    0.001 ms | max at: 231259.729670 s
  kworker/u16:2-p:5987  |      0.144 ms |       32 | avg:    0.001 ms | max:    0.002 ms | max at: 231259.729193 s
  perf:7114             |      3.503 ms |        1 | avg:    0.001 ms | max:    0.001 ms | max at: 231259.729656 s
  kworker/u16:1-p:7112  |      0.184 ms |       52 | avg:    0.001 ms | max:    0.001 ms | max at: 231259.729201 s
  chrome:5713           |      0.067 ms |        1 | avg:    0.000 ms | max:    0.000 ms | max at:      0.000000 s
 -----------------------------------------------------------------------------------------------------------------
  TOTAL:                |      6.141 ms |      137 |
 ---------------------------------------------------

您可以看到进程ls以及进程perf在运行perf sched record命令的同时并存的所有其他进程中都存在。

  • 您可以运行perf sched timehist以获得各个调度程序事件的详细摘要。
sudo perf sched timehist

           time    cpu  task name                       wait time  sch delay   run time
                        [tid/pid]                          (msec)     (msec)     (msec)
--------------- ------  ------------------------------  ---------  ---------  ---------
  231259.726350 [0005]  <idle>                              0.000      0.000      0.000 
  231259.726465 [0005]  chrome[5713]                        0.000      0.000      0.114 
  231259.727447 [0005]  <idle>                              0.114      0.000      0.981 
  231259.727513 [0005]  chrome[5713]                        0.981      0.000      0.066 
  231259.727898 [0004]  <idle>                              0.000      0.000      0.000 
  231259.727951 [0004]  perf[7118]                          0.000      0.002      0.052 
  231259.727958 [0002]  perf[7114]                          0.000      0.000      0.000 
  231259.727960 [0000]  <idle>                              0.000      0.000      0.000 
  231259.727964 [0004]  migration/4[35]                     0.000      0.002      0.013 
  231259.729193 [0006]  <idle>                              0.000      0.000      0.000 
  231259.729201 [0002]  <idle>                              0.000      0.000      1.242 
  231259.729201 [0003]  <idle>                              0.000      0.000      0.000   231259.729216 [0002]  kworker/u16:1-p[7112]               0.006      0.001      0.005 
  231259.729219 [0002]  <idle>                              0.005      0.000      0.002 
  231259.729222 [0002]  kworker/u16:1-p[7112]               0.002      0.000      0.002 
  231259.729222 [0006]  <idle>                              0.001      0.000      0.007 

wait time是指任务等待被唤醒的时间,sch delay是调度器在任务被唤醒后将其实际调度到运行队列中所花费的时间。上。

您可以通过pid过滤时间历史命令,并且由于pid命令的ls 7118 (您可以在{{1 }}输出。

perf sched latency

现在,为了观察此过程的唤醒事件,您可以运行将命令行开关sudo perf sched timehist -p 7118 Samples do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 231259.727951 [0004] perf[7118] 0.000 0.002 0.052 231259.729657 [0000] ls[7118] 0.009 0.000 1.697 添加到上一个命令-

-w

sudo perf sched timehist -p 7118 -w Samples do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 231259.727895 [0002] perf[7114] awakened: perf[7118] 231259.727948 [0004] perf[7118] awakened: migration/4[35] 231259.727951 [0004] perf[7118] 0.000 0.002 0.052 231259.729190 [0000] ls[7118] awakened: kworker/u16:2-p[5987] 231259.729199 [0000] ls[7118] awakened: kworker/u16:1-p[7112] 231259.729207 [0000] ls[7118] awakened: kworker/u16:2-p[5987] 231259.729209 [0000] ls[7118] awakened: kworker/u16:1-p[7112] 231259.729212 [0000] ls[7118] awakened: kworker/u16:2-p[5987] 231259.729218 [0000] ls[7118] awakened: kworker/u16:1-p[7112] 231259.729221 [0000] ls[7118] awakened: kworker/u16:2-p[5987] 231259.729223 [0000] ls[7118] awakened: kworker/u16:1-p[7112] 231259.729226 [0000] ls[7118] awakened: kworker/u16:1-p[7112] 231259.729231 [0000] ls[7118] awakened: kworker/u16:1-p[7112] 231259.729233 [0000] ls[7118] awakened: kworker/u16:2-p[5987] 231259.729237 [0000] ls[7118] awakened: kworker/u16:2-p[5987] 231259.729240 [0000] ls[7118] awakened: kworker/u16:1-p[7112] 231259.729242 [0000] ls[7118] awakened: kworker/u16:2-p[5987] -------------------------------------- # some other events here 231259.729548 [0000] ls[7118] awakened: kworker/u16:0-p[6962] 231259.729553 [0000] ls[7118] awakened: kworker/u16:1-p[7112] 231259.729555 [0000] ls[7118] awakened: kworker/u16:0-p[6962] 231259.729557 [0000] ls[7118] awakened: kworker/u16:1-p[7112] 231259.729562 [0000] ls[7118] awakened: kworker/u16:0-p[6962] 231259.729564 [0000] ls[7118] awakened: kworker/u16:1-p[7112] 231259.729655 [0000] ls[7118] awakened: perf[7114] 231259.729657 [0000] ls[7118] 0.009 0.000 1.697 线程在 231259.729190 ms 处中断kworker的初始执行及其子进程perf。您可以看到ls进程最终被唤醒,要在所有内核工作线程都完成某些工作之后,在 231259.729655 ms 上实际执行。您可以使用下面的命令-

对上述perf的详细信息进行更详细的CPU可视化
timehist

“ CPU可视化”列(“ 012345678”)的上下文切换事件带有“ s”,表示首先 CPU 4 ,然后 CPU 0 切换到sudo perf sched timehist -p 7118 -wV Samples do not have callchains. time cpu 012345678 task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ --------- ------------------------------ --------- --------- --------- 231259.727895 [0002] perf[7114] awakened: perf[7118] 231259.727948 [0004] perf[7118] awakened: migration/4[35] 231259.727951 [0004] s perf[7118] 0.000 0.002 0.052 231259.729190 [0000] ls[7118] awakened: kworker/u16:2-p[5987] 231259.729199 [0000] ls[7118] awakened: kworker/u16:1-p[7112] 231259.729207 [0000] ls[7118] awakened: kworker/u16:2-p[5987] 231259.729209 [0000] ls[7118] awakened: kworker/u16:1-p[7112] 231259.729212 [0000] ls[7118] awakened: kworker/u16:2-p[5987] -------------------------------------------------- # some other events here 231259.729562 [0000] ls[7118] awakened: kworker/u16:0-p[6962] 231259.729564 [0000] ls[7118] awakened: kworker/u16:1-p[7112] 231259.729655 [0000] ls[7118] awakened: perf[7114] 231259.729657 [0000] s ls[7118] 0.009 0.000 1.697 进程。

注意:您可以用ls其余命令perf schedperf sched script的输出补充上述信息。