使用英特尔VTune分析器

时间:2017-08-01 12:04:45

标签: optimization fortran performance-testing intel vtune

我正在使用fortran项目模拟植被动态。代码很慢,所以我总是在寻找优化它的方法。 我一直在读,有一个"规则"说通常90%的时间花在10%的代码上。为了找出这些瓶颈,我开始使用intel VTune性能分析器。仿真分析表明,代码的特定部分花费了大量时间,如图像Figure 1所示。 leaftw_derivs中最耗时的部分如下图所示。 Figure 2

分析中提到的代码如下所示。

   !---- Update soil moisture and energy from transpiration/root uptake. ------------------!
   if (rk4aux(ibuff)%any_resolvable) then
      do k1 = klsl, mzg    ! loop over extracted water
         do k2=k1,mzg
            if (rk4site%ntext_soil(k2) /= 13) then
               !---------------------------------------------------------------------------!
               !     Transpiration happens only when there is some water left down to this !
               ! layer.                                                                    !
               !---------------------------------------------------------------------------!
               if (rk4aux(ibuff)%avail_h2o_int(k1) > 0.d0) then
                  !------------------------------------------------------------------------!
                  !    Find the contribution of layer k2 for the transpiration from        !
                  ! cohorts that reach layer k1.                                           !
                  !------------------------------------------------------------------------!
                  ext_weight = rk4aux(ibuff)%avail_h2o_lyr(k2) / rk4aux(ibuff)%avail_h2o_int(k1)

                  !------------------------------------------------------------------------!
                  wloss_tot      = 0.d0
                  qloss_tot      = 0.d0
                  wvlmeloss_tot  = 0.d0
                  qvlmeloss_tot  = 0.d0

                  do ico=1,cpatch%ncohorts
                     !----- Find the loss from this cohort. -------------------------------!
                     wloss         = rk4aux(ibuff)%extracted_water(ico,k1) * ext_weight
                     qloss         = wloss * tl2uint8(initp%soil_tempk(k2),1.d0)
                     wvlmeloss     = wloss * wdnsi8 * dslzi8(k2)
                     qvlmeloss     = qloss * dslzi8(k2)
                     !---------------------------------------------------------------------!


                     !---------------------------------------------------------------------!
                     !      Add the internal energy to the cohort.  This energy will be    !
                     ! eventually lost to the canopy air space because of transpiration,   !
                     ! but we will do it in two steps so we ensure energy is conserved.    !
                     !---------------------------------------------------------------------!
                     dinitp%leaf_energy(ico) = dinitp%leaf_energy(ico)  + qloss
                     dinitp%veg_energy(ico)  = dinitp%veg_energy(ico)   + qloss
                     initp%hflx_lrsti(ico) = initp%hflx_lrsti(ico)      + qloss
                     !---------------------------------------------------------------------!

                     !----- Integrate the total to be removed from this layer. ------------!
                     wloss_tot     = wloss_tot     + wloss
                     qloss_tot     = qloss_tot     + qloss
                     wvlmeloss_tot = wvlmeloss_tot + wvlmeloss
                     qvlmeloss_tot = qvlmeloss_tot + qvlmeloss
                     !---------------------------------------------------------------------!
                  end do
                  !------------------------------------------------------------------------!



                  !----- Update derivatives of water, energy, and transpiration. ----------!
                  dinitp%soil_water   (k2) = dinitp%soil_water(k2)    - wvlmeloss_tot
                  dinitp%soil_energy  (k2) = dinitp%soil_energy(k2)   - qvlmeloss_tot
                  dinitp%avg_transloss(k2) = dinitp%avg_transloss(k2) - wloss_tot
                  !------------------------------------------------------------------------!
               end if
               !---------------------------------------------------------------------------!
            end if
            !------------------------------------------------------------------------------!
         end do
         !---------------------------------------------------------------------------------!
      end do
      !------------------------------------------------------------------------------------!
   end if
   !---------------------------------------------------------------------------------------!

我对优化有一个非常基本的了解,但我不知道可以在这里做些什么来改进代码。特别是我不明白退役指令的含义以及如何去做。有没有办法加快计算速度?

修改

再多想一想,我意识到这里有一些简单的优化。例如,将条件if (rk4aux(ibuff)%avail_h2o_int(k1) > 0.d0) then移到循环外部,以及将tl2uint8(initp%soil_tempk(k2),1.d0)移到最里面的循环之外。

然而,我无法理解VTune给出的所谓长时间的原因:3行

             dinitp%leaf_energy(ico) = dinitp%leaf_energy(ico)  + qloss
             dinitp%veg_energy(ico)  = dinitp%veg_energy(ico)   + qloss
             initp%hflx_lrsti(ico) = initp%hflx_lrsti(ico)      + qloss

只是执行添加。这应该非常快,但分析师说在那里花了很多时间。那为什么会这样?

EDIT2

我重写了整个循环,试图尽可能地优化。这是我提出的代码

   !---- Update soil moisture and energy from transpiration/root uptake. ------------------!
   if (rk4aux(ibuff)%any_resolvable) then
      do k1 = klsl, mzg    ! loop over extracted water

               !---------------------------------------------------------------------------!
               !     Transpiration happens only when there is some water left down to this !
               ! layer.                                                                    !
               !---------------------------------------------------------------------------!
               if (rk4aux(ibuff)%avail_h2o_int(k1) > 0.d0) then

                wloss_tot_k1 = 0.d0

                do ico=1,cpatch%ncohorts
                     !----- Integrate the total to be removed from this layer. ------------!
                     wloss_tot_k1 = wloss_tot_k1 + rk4aux(ibuff)%extracted_water(ico,k1)                     
                     !---------------------------------------------------------------------!
                end do
                  !------------------------------------------------------------------------!

                  do k2=k1,mzg
                    if (rk4site%ntext_soil(k2) /= 13) then
                  do ico=1,cpatch%ncohorts
                     wloss         = rk4aux(ibuff)%extracted_water(ico,k1) * ext_weight
                     uint_here1    = wloss * uint_here

                     dinitp%leaf_energy(ico) = dinitp%leaf_energy(ico) + uint_here1
                     dinitp%veg_energy(ico)  = dinitp%veg_energy(ico)  + uint_here1
                     initp%hflx_lrsti(ico)   = initp%hflx_lrsti(ico)   + uint_here1
                  end do
                  !------------------------------------------------------------------------!

                  wloss_tot     = wloss_tot_k1 * ext_weight                   
                  wvlmeloss_tot = wloss_tot * dslzi8(k2) * wdnsi8
                  qvlmeloss_tot = wloss_tot * dslzi8(k2) * uint_here


                  !----- Update derivatives of water, energy, and transpiration. ----------!
                  dinitp%soil_water   (k2) = dinitp%soil_water(k2)    - wvlmeloss_tot
                  dinitp%soil_energy  (k2) = dinitp%soil_energy(k2)   - qvlmeloss_tot
                  dinitp%avg_transloss(k2) = dinitp%avg_transloss(k2) - wloss_tot
                  !------------------------------------------------------------------------!


               end if
               !---------------------------------------------------------------------------!
            end do
            !------------------------------------------------------------------------------!
         end if
         !---------------------------------------------------------------------------------!
      end do
      !------------------------------------------------------------------------------------!
   end if
   !---------------------------------------------------------------------------------------!

它有点长,所以我不希望人们经历它。如果我现在运行分析仪,我会大大减少时间(从290s到185s,尽管在实际模拟中,速度似乎略微降低)。 New times

然而,在查看抽样时,仍然有相当多的时间花在操作上,我不希望这些时间昂贵"。我仍然无法获得退休说明的含义以及如何进行操作。目前我认为这已经足够了,我想进一步加速的正确方法是利用Holmz建议的openMP功能。

enter image description here enter image description here

0 个答案:

没有答案