Question

I work with geophysical models and a common situation is needing to multiply, add, etc. 2D data with 3D data. Below is an example.

module benchmarks
  implicit none
  integer, parameter :: n=500
  integer :: k
  real :: d2(n,n)
  real :: d3(n,n,n)
  contains
  ! Iteration
  subroutine benchmark_a(res)
    real, intent(out) :: res(n,n,n)
    do k = 1, size(d3,3)
      res(:,:,k) = d2*d3(:,:,k)
    end do
  end subroutine
  ! Spread
  subroutine benchmark_b(res)
    real, intent(out) :: res(n,n,n)
    res = d3*spread(d2, 3, size(d3,3))
  end subroutine
end module

program main
  use benchmarks
  real :: t, tarray(2)
  real :: res(n,n,n)
  call random_number(d2)
  call random_number(d3)
  ! Iteration
  call dtime(tarray, t)
  call benchmark_a(res)
  call dtime(tarray, t)
  write(*,*) 'Iteration', t
  ! Spread
  call dtime(tarray, t)
  call benchmark_b(res)
  call dtime(tarray, t)
  write(*,*) 'Spread', t
end program

When I run this with varying dimension size n, I generally find spread is much much slower; for example:

Spread   2.09942889
Iteration  0.458283991

Does anyone know why the spread approach rather than an explicit for loop (which I thought were, generally, to be avoided at all costs) is so much slower?

Answer 1

这里的基本答案是“不是”。也许在特定的编译器和特定的环境下，内部函数的优化程度不如显式的DO循环那么好，但是不必那样做。我使用ifort 19进行了测试，即使在默认优化级别下，SPREAD内在函数和显式循环也会生成类似的代码，当我更正程序以使用结果时，内在函数会更快。

Iteration 0.2187500 0.1376885 Spread 9.3750000E-02 0.1376885

我也要警告（就像我在对问题的评论中所做的那样），简单的基准程序通常无法衡量作者的想法。您的原始示例和修订后的示例均显示出最常见的错误，即从未使用被测工作的结果，因此足够聪明的编译器可以简单地蒸发掉整个操作。的确，当我使用ifort 19构建这两个测试用例时，编译器会完全删除所有工作，仅留下定时代码。不用说，运行速度很快。

  implicit none
  integer, parameter :: n=500
  integer :: k
  real :: d2(n,n)
  real :: d3(n,n,n)
  contains
  ! Iteration
  subroutine benchmark_a(res)
    real, intent(out) :: res(n,n,n)
    do k = 1, size(d3,3)
      res(:,:,k) = d2*d3(:,:,k)
    end do
  end subroutine
  ! Spread
  subroutine benchmark_b(res)
    real, intent(out) :: res(n,n,n)
    res = d3*spread(d2, 3, size(d3,3))
  end subroutine
end module

program main
  use benchmarks
  real :: tstart,tend
  real :: res(n,n,n)
  call random_number(d2)
  call random_number(d3)
  ! Iteration
  call cpu_time(tstart)
  call benchmark_a(res)
  call cpu_time(tend)
  write(*,*) 'Iteration', tend-tstart, res(10,10,10)
  ! Spread
  call cpu_time(tstart)
  call benchmark_b(res)
  call cpu_time(tend)
  write(*,*) 'Spread', tend-tstart, res(10,10,10)
end program```

Why is the Fortran intrinsic function "spread" often slower than explicit iteration

1 个答案: