使用gprof分析我的代码时不一致

时间:2016-01-26 09:50:59

标签: fortran profiler intel-fortran gprof

我使用与OpenMP并行化的相对简单的代码来熟悉gprof。

我的代码主要包括从输入文件中收集数据,执行一些数组操作以及将新数据写入不同的输出文件。我对内在子例程CPU_TIME进行了一些调用,以查看gprof是否准确:

PROGRAM main
    USE global_variables
    USE fileio, ONLY: read_old_restart, write_new_restart, output_slice, write_solution
    USE change_vars
    IMPLICIT NONE
    REAL(dp) :: t0, t1

    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CALL CPU_TIME(t0)
    CALL allocate_data
    CALL CPU_TIME(t1)
    PRINT*, "Allocate data =", t1 - t0

    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CALL CPU_TIME(t0)
    CALL build_grid
    CALL CPU_TIME(t1)
    PRINT*, "Build grid    =", t1 - t0

    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CALL CPU_TIME(t0)
    CALL read_old_restart
    CALL CPU_TIME(t1)
    PRINT*, "Read restart  =", t1 - t0


    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CALL CPU_TIME(t0)
    CALL regroup_all
    CALL CPU_TIME(t1)
    PRINT*, "Regroup all   =", t1 - t0

    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CALL CPU_TIME(t0)
    CALL redistribute_all
    CALL CPU_TIME(t1)
    PRINT*, "Redistribute  =", t1 - t0

    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CALL CPU_TIME(t0)
    CALL write_new_restart
    CALL CPU_TIME(t1)
    PRINT*, "Write restart =", t1 - t0
END PROGRAM main

这是输出:

 Allocate data =  1.000000000000000E-003
 Build grid    =  0.000000000000000E+000
 Read restart  =   10.7963590000000
 Regroup all   =   6.65998700000000
 Redistribute  =   14.3518180000000
 Write restart =   53.5218640000000

因此,write_new_restart子程序最耗时,占总运行时间的62%。但是根据grof,由redistribute_vars多次调用的子例程redistribute_all是最耗时的,占总时间的70%。这是gprof的输出:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 74.40      8.95     8.95       61     0.15     0.15  change_vars_mp_redistribute_vars_
 19.12     11.25     2.30       60     0.04     0.04  change_vars_mp_regroup_vars_
  6.23     12.00     0.75       63     0.01     0.01  change_vars_mp_fill_last_blocks_
  0.08     12.01     0.01        1     0.01     2.31  change_vars_mp_regroup_all_
  0.08     12.02     0.01                             __intel_ssse3_rep_memcpy
  0.08     12.03     0.01                             for_open
  0.00     12.03     0.00        1     0.00    12.01  MAIN__
  0.00     12.03     0.00        1     0.00     0.00  change_vars_mp_build_grid_
  0.00     12.03     0.00        1     0.00     9.70  change_vars_mp_redistribute_all_
  0.00     12.03     0.00        1     0.00     0.00  fileio_mp_read_old_restart_
  0.00     12.03     0.00        1     0.00     0.00  fileio_mp_write_new_restart_
  0.00     12.03     0.00        1     0.00     0.00  global_variables_mp_allocate_data_


index % time    self  children    called     name
                0.00   12.01       1/1           main [2]
[1]     99.8    0.00   12.01       1         MAIN__ [1]
                0.00    9.70       1/1           change_vars_mp_redistribute_all_ [3]
                0.01    2.30       1/1           change_vars_mp_regroup_all_ [5]
                0.00    0.00       1/1           global_variables_mp_allocate_data_ [13]
                0.00    0.00       1/1           change_vars_mp_build_grid_ [10]
                0.00    0.00       1/1           fileio_mp_read_old_restart_ [11]
                0.00    0.00       1/1           fileio_mp_write_new_restart_ [12]
-----------------------------------------------
                                                 <spontaneous>
[2]     99.8    0.00   12.01                 main [2]
                0.00   12.01       1/1           MAIN__ [1]
-----------------------------------------------
                0.00    9.70       1/1           MAIN__ [1]
[3]     80.6    0.00    9.70       1         change_vars_mp_redistribute_all_ [3]
                8.95    0.00      61/61          change_vars_mp_redistribute_vars_ [4]
                0.75    0.00      63/63          change_vars_mp_fill_last_blocks_ [7]
-----------------------------------------------
                8.95    0.00      61/61          change_vars_mp_redistribute_all_ [3]
[4]     74.4    8.95    0.00      61         change_vars_mp_redistribute_vars_ [4]
-----------------------------------------------
                0.01    2.30       1/1           MAIN__ [1]
[5]     19.2    0.01    2.30       1         change_vars_mp_regroup_all_ [5]
                2.30    0.00      60/60          change_vars_mp_regroup_vars_ [6]
-----------------------------------------------
                2.30    0.00      60/60          change_vars_mp_regroup_all_ [5]
[6]     19.1    2.30    0.00      60         change_vars_mp_regroup_vars_ [6]
-----------------------------------------------
                0.75    0.00      63/63          change_vars_mp_redistribute_all_ [3]
[7]      6.2    0.75    0.00      63         change_vars_mp_fill_last_blocks_ [7]
-----------------------------------------------
                                                 <spontaneous>
[8]      0.1    0.01    0.00                 for_open [8]
-----------------------------------------------
                                                 <spontaneous>
[9]      0.1    0.01    0.00                 __intel_ssse3_rep_memcpy [9]
-----------------------------------------------
                0.00    0.00       1/1           MAIN__ [1]
[10]     0.0    0.00    0.00       1         change_vars_mp_build_grid_ [10]
-----------------------------------------------
                0.00    0.00       1/1           MAIN__ [1]
[11]     0.0    0.00    0.00       1         fileio_mp_read_old_restart_ [11]
-----------------------------------------------
                0.00    0.00       1/1           MAIN__ [1]
[12]     0.0    0.00    0.00       1         fileio_mp_write_new_restart_ [12]
-----------------------------------------------
                0.00    0.00       1/1           MAIN__ [1]
[13]     0.0    0.00    0.00       1         global_variables_mp_allocate_data_ [13]
-----------------------------------------------

为了您的信息,regroup_all多次拨打regroup_vars,多次拨打redistribute_all来电redistribute_varsfill_last_blocks

我正在使用ifort选项-openmp -O2 -pg编译我的代码。

问题:

为什么gprof没有看到我的文件i / o子程序占用的时间? (read_old_restartwrite_new_restart

1 个答案:

答案 0 :(得分:1)

gprof具体不包括I / O时间。它只试图测量CPU时间。

那是因为它只做两件事:1)在1/100秒时钟上对程序计数器进行采样,程序计数器在I / O期间无意义,以及2)计算任何函数的次数B由任何函数A调用。

从呼叫计数中,它试图猜测每个功能的CPU时间可以归因于每个呼叫者的多少。 这是对预先存在的剖析器的全面推进。

当您使用gprof时,您应该了解它的作用和what it doesn't do