我正在使用fortran项目模拟植被动态。代码很慢,所以我总是在寻找优化它的方法。
我一直在读,有一个"规则"说通常90%的时间花在10%的代码上。为了找出这些瓶颈,我开始使用intel VTune性能分析器。仿真分析表明,代码的特定部分花费了大量时间,如图像所示。 leaftw_derivs
中最耗时的部分如下图所示。
分析中提到的代码如下所示。
!---- Update soil moisture and energy from transpiration/root uptake. ------------------!
if (rk4aux(ibuff)%any_resolvable) then
do k1 = klsl, mzg ! loop over extracted water
do k2=k1,mzg
if (rk4site%ntext_soil(k2) /= 13) then
!---------------------------------------------------------------------------!
! Transpiration happens only when there is some water left down to this !
! layer. !
!---------------------------------------------------------------------------!
if (rk4aux(ibuff)%avail_h2o_int(k1) > 0.d0) then
!------------------------------------------------------------------------!
! Find the contribution of layer k2 for the transpiration from !
! cohorts that reach layer k1. !
!------------------------------------------------------------------------!
ext_weight = rk4aux(ibuff)%avail_h2o_lyr(k2) / rk4aux(ibuff)%avail_h2o_int(k1)
!------------------------------------------------------------------------!
wloss_tot = 0.d0
qloss_tot = 0.d0
wvlmeloss_tot = 0.d0
qvlmeloss_tot = 0.d0
do ico=1,cpatch%ncohorts
!----- Find the loss from this cohort. -------------------------------!
wloss = rk4aux(ibuff)%extracted_water(ico,k1) * ext_weight
qloss = wloss * tl2uint8(initp%soil_tempk(k2),1.d0)
wvlmeloss = wloss * wdnsi8 * dslzi8(k2)
qvlmeloss = qloss * dslzi8(k2)
!---------------------------------------------------------------------!
!---------------------------------------------------------------------!
! Add the internal energy to the cohort. This energy will be !
! eventually lost to the canopy air space because of transpiration, !
! but we will do it in two steps so we ensure energy is conserved. !
!---------------------------------------------------------------------!
dinitp%leaf_energy(ico) = dinitp%leaf_energy(ico) + qloss
dinitp%veg_energy(ico) = dinitp%veg_energy(ico) + qloss
initp%hflx_lrsti(ico) = initp%hflx_lrsti(ico) + qloss
!---------------------------------------------------------------------!
!----- Integrate the total to be removed from this layer. ------------!
wloss_tot = wloss_tot + wloss
qloss_tot = qloss_tot + qloss
wvlmeloss_tot = wvlmeloss_tot + wvlmeloss
qvlmeloss_tot = qvlmeloss_tot + qvlmeloss
!---------------------------------------------------------------------!
end do
!------------------------------------------------------------------------!
!----- Update derivatives of water, energy, and transpiration. ----------!
dinitp%soil_water (k2) = dinitp%soil_water(k2) - wvlmeloss_tot
dinitp%soil_energy (k2) = dinitp%soil_energy(k2) - qvlmeloss_tot
dinitp%avg_transloss(k2) = dinitp%avg_transloss(k2) - wloss_tot
!------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------!
end if
!------------------------------------------------------------------------------!
end do
!---------------------------------------------------------------------------------!
end do
!------------------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------------------!
我对优化有一个非常基本的了解,但我不知道可以在这里做些什么来改进代码。特别是我不明白退役指令的含义以及如何去做。有没有办法加快计算速度?
修改
再多想一想,我意识到这里有一些简单的优化。例如,将条件if (rk4aux(ibuff)%avail_h2o_int(k1) > 0.d0) then
移到循环外部,以及将tl2uint8(initp%soil_tempk(k2),1.d0)
移到最里面的循环之外。
然而,我无法理解VTune给出的所谓长时间的原因:3行
dinitp%leaf_energy(ico) = dinitp%leaf_energy(ico) + qloss
dinitp%veg_energy(ico) = dinitp%veg_energy(ico) + qloss
initp%hflx_lrsti(ico) = initp%hflx_lrsti(ico) + qloss
只是执行添加。这应该非常快,但分析师说在那里花了很多时间。那为什么会这样?
EDIT2
我重写了整个循环,试图尽可能地优化。这是我提出的代码
!---- Update soil moisture and energy from transpiration/root uptake. ------------------!
if (rk4aux(ibuff)%any_resolvable) then
do k1 = klsl, mzg ! loop over extracted water
!---------------------------------------------------------------------------!
! Transpiration happens only when there is some water left down to this !
! layer. !
!---------------------------------------------------------------------------!
if (rk4aux(ibuff)%avail_h2o_int(k1) > 0.d0) then
wloss_tot_k1 = 0.d0
do ico=1,cpatch%ncohorts
!----- Integrate the total to be removed from this layer. ------------!
wloss_tot_k1 = wloss_tot_k1 + rk4aux(ibuff)%extracted_water(ico,k1)
!---------------------------------------------------------------------!
end do
!------------------------------------------------------------------------!
do k2=k1,mzg
if (rk4site%ntext_soil(k2) /= 13) then
do ico=1,cpatch%ncohorts
wloss = rk4aux(ibuff)%extracted_water(ico,k1) * ext_weight
uint_here1 = wloss * uint_here
dinitp%leaf_energy(ico) = dinitp%leaf_energy(ico) + uint_here1
dinitp%veg_energy(ico) = dinitp%veg_energy(ico) + uint_here1
initp%hflx_lrsti(ico) = initp%hflx_lrsti(ico) + uint_here1
end do
!------------------------------------------------------------------------!
wloss_tot = wloss_tot_k1 * ext_weight
wvlmeloss_tot = wloss_tot * dslzi8(k2) * wdnsi8
qvlmeloss_tot = wloss_tot * dslzi8(k2) * uint_here
!----- Update derivatives of water, energy, and transpiration. ----------!
dinitp%soil_water (k2) = dinitp%soil_water(k2) - wvlmeloss_tot
dinitp%soil_energy (k2) = dinitp%soil_energy(k2) - qvlmeloss_tot
dinitp%avg_transloss(k2) = dinitp%avg_transloss(k2) - wloss_tot
!------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------!
end do
!------------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------------!
end do
!------------------------------------------------------------------------------------!
end if
!---------------------------------------------------------------------------------------!
它有点长,所以我不希望人们经历它。如果我现在运行分析仪,我会大大减少时间(从290s到185s,尽管在实际模拟中,速度似乎略微降低)。
然而,在查看抽样时,仍然有相当多的时间花在操作上,我不希望这些时间昂贵"。我仍然无法获得退休说明的含义以及如何进行操作。目前我认为这已经足够了,我想进一步加速的正确方法是利用Holmz建议的openMP功能。