Question

在调查gfortran为什么ifort比使用循环的天真实现要慢得多的时候，我遇到了reshape和my_reshape的奇怪行为：

我为两个重塑函数（reshape3Dto1D和reshape1Dto3D）定义了一个接口module test interface my_reshape module procedure :: reshape3Dto1D end interface contains function reshape3Dto1D( mat, dims ) use, intrinsic :: iso_fortran_env implicit none integer, parameter :: cp = REAL64 real(cp),intent(in) :: mat(:,:,:) integer,intent(in) :: dims(1) real(cp) :: reshape3Dto1D(dims(1)) integer :: x,y,z, i i=0 do z=1,size(mat,3) do y=1,size(mat,2) do x=1,size(mat,1) i=i+1 reshape3Dto1D(i) = mat(x,y,z) enddo ! y enddo ! y enddo !z end function end module program changeDim use test use omp_lib use, intrinsic :: iso_fortran_env implicit none integer, parameter :: cp = REAL64 real(REAL64) :: t1, t2, t3, t4 integer,parameter :: dimX=100, dimY=100, dimZ=100 integer,parameter :: dimProduct = dimX*dimY*dimZ integer :: stat real(cp),pointer,contiguous :: matrix3d(:,:,:), matrix1d(:) allocate( matrix3d(dimX,dimY,dimZ), matrix1d(dimProduct), stat=stat ) if (stat/=0) stop 'Cannot allocate memory'; call random_number(matrix3d) matrix1d = 0._cp ! (1) Naive copy using a function t1 = omp_get_wtime() matrix1d = reshape3Dto1D( matrix3d, [ dimProduct ] ) t2 = omp_get_wtime() ! (2) Reshape matrix1d = reshape( matrix3d, [ dimProduct ] ) t3 = omp_get_wtime() ! (3) Same as (1), but using the interface matrix1d = my_reshape( matrix3d, [ dimProduct ] ) t4 = omp_get_wtime() write(*,*) 'Reshape: ',t3-t2 write(*,*) 'Naive fct direct: ',t2-t1 write(*,*) 'Naive fct interface: ',t4-t3 deallocate( matrix3d, matrix1d ) end program。当直接调用接口函数而不是成员函数时，我注意到加速了10-40％！更改调用的顺序，级别优化，甚至编译器都不会更改此行为。

我犯了错误，或者有人对此有解释吗？

这是（简化）代码：

reshapetest_simple.F90：

gfortran 4.8.1

我使用了ifort 13.1.3和ifort -o reshape-ifort -openmp reshapetest_simple.F90 -O3 gfortran -o reshape-gfortran -fopenmp reshapetest_simple.F90 -O3。二进制文件是使用

编译的

OMP_NUM_THREADS=1 ./reshape-gfortran 
 Reshape:                6.8527370000310839E-003
 Naive fct direct:       5.0175579999631736E-003
 Naive fct interface:    4.6131109999123510E-003
OMP_NUM_THREADS=1 ./reshape-ifort 
 Reshape:               3.495931625366211E-003
 Naive fct direct:      5.089998245239258E-003
 Naive fct interface:   3.136873245239258E-003

并给出了以下结果：

{{1}}

BTW：我知道，对于这种重塑，最好使用指针来避免复制数组...

Answer 1

影响因素

此行为与问题大小（100x100x100）和real,pointer,contiguous的使用有关。

实验

替换

real(cp),pointer,contiguous :: matrix3d(:,:,:), matrix1d(:)

与

real(cp),  allocatable      :: matrix3d(:,:,:), matrix1d(:)

并且行为会或多或少地消失，在我的测试中完全将dimX，dimY和dimZ增加到200.

使用GNU Fortran 4.7.3进行计时

指针，dim = 100

Reshape:                5.5749416351318359E-003
Naive fct direct:       6.2539577484130859E-003
Naive fct interface:    2.8791427612304688E-003

可分配，dim = 100

Reshape:                4.2719841003417969E-003
Naive fct direct:       1.4619827270507813E-003
Naive fct interface:    1.3799667358398438E-003

指针，dim = 200

Reshape:                4.2979001998901367E-002
Naive fct direct:       5.7554006576538086E-002
Naive fct interface:    3.6303043365478516E-002

可分配，dim = 200

Reshape:                4.3957948684692383E-002
Naive fct direct:       1.1255979537963867E-002
Naive fct interface:    1.1703014373779297E-002

调用接口函数比直接调用成员函数更快？

1 个答案:

影响因素

实验

使用GNU Fortran 4.7.3进行计时