在调查gfortran
为什么ifort
比使用循环的天真实现要慢得多的时候,我遇到了reshape
和my_reshape
的奇怪行为:
我为两个重塑函数(reshape3Dto1D
和reshape1Dto3D
)定义了一个接口module test
interface my_reshape
module procedure :: reshape3Dto1D
end interface
contains
function reshape3Dto1D( mat, dims )
use, intrinsic :: iso_fortran_env
implicit none
integer, parameter :: cp = REAL64
real(cp),intent(in) :: mat(:,:,:)
integer,intent(in) :: dims(1)
real(cp) :: reshape3Dto1D(dims(1))
integer :: x,y,z, i
i=0
do z=1,size(mat,3)
do y=1,size(mat,2)
do x=1,size(mat,1)
i=i+1
reshape3Dto1D(i) = mat(x,y,z)
enddo ! y
enddo ! y
enddo !z
end function
end module
program changeDim
use test
use omp_lib
use, intrinsic :: iso_fortran_env
implicit none
integer, parameter :: cp = REAL64
real(REAL64) :: t1, t2, t3, t4
integer,parameter :: dimX=100, dimY=100, dimZ=100
integer,parameter :: dimProduct = dimX*dimY*dimZ
integer :: stat
real(cp),pointer,contiguous :: matrix3d(:,:,:), matrix1d(:)
allocate( matrix3d(dimX,dimY,dimZ), matrix1d(dimProduct), stat=stat )
if (stat/=0) stop 'Cannot allocate memory';
call random_number(matrix3d)
matrix1d = 0._cp
! (1) Naive copy using a function
t1 = omp_get_wtime()
matrix1d = reshape3Dto1D( matrix3d, [ dimProduct ] )
t2 = omp_get_wtime()
! (2) Reshape
matrix1d = reshape( matrix3d, [ dimProduct ] )
t3 = omp_get_wtime()
! (3) Same as (1), but using the interface
matrix1d = my_reshape( matrix3d, [ dimProduct ] )
t4 = omp_get_wtime()
write(*,*) 'Reshape: ',t3-t2
write(*,*) 'Naive fct direct: ',t2-t1
write(*,*) 'Naive fct interface: ',t4-t3
deallocate( matrix3d, matrix1d )
end program
。当直接调用接口函数而不是成员函数时,我注意到加速了10-40%!更改调用的顺序,级别优化,甚至编译器都不会更改此行为。
我犯了错误,或者有人对此有解释吗?
这是(简化)代码:
reshapetest_simple.F90:
gfortran 4.8.1
我使用了ifort 13.1.3
和ifort -o reshape-ifort -openmp reshapetest_simple.F90 -O3
gfortran -o reshape-gfortran -fopenmp reshapetest_simple.F90 -O3
。二进制文件是使用
OMP_NUM_THREADS=1 ./reshape-gfortran
Reshape: 6.8527370000310839E-003
Naive fct direct: 5.0175579999631736E-003
Naive fct interface: 4.6131109999123510E-003
OMP_NUM_THREADS=1 ./reshape-ifort
Reshape: 3.495931625366211E-003
Naive fct direct: 5.089998245239258E-003
Naive fct interface: 3.136873245239258E-003
并给出了以下结果:
{{1}}BTW:我知道,对于这种重塑,最好使用指针来避免复制数组...
答案 0 :(得分:1)
此行为与问题大小(100x100x100)和real,pointer,contiguous
的使用有关。
替换
real(cp),pointer,contiguous :: matrix3d(:,:,:), matrix1d(:)
与
real(cp), allocatable :: matrix3d(:,:,:), matrix1d(:)
并且行为会或多或少地消失,在我的测试中完全将dimX,dimY和dimZ增加到200.
指针,dim = 100
Reshape: 5.5749416351318359E-003
Naive fct direct: 6.2539577484130859E-003
Naive fct interface: 2.8791427612304688E-003
可分配,dim = 100
Reshape: 4.2719841003417969E-003
Naive fct direct: 1.4619827270507813E-003
Naive fct interface: 1.3799667358398438E-003
指针,dim = 200
Reshape: 4.2979001998901367E-002
Naive fct direct: 5.7554006576538086E-002
Naive fct interface: 3.6303043365478516E-002
可分配,dim = 200
Reshape: 4.3957948684692383E-002
Naive fct direct: 1.1255979537963867E-002
Naive fct interface: 1.1703014373779297E-002