我意识到这个问题已被问到before,但不是在IO的背景下。有没有理由相信:
!compiler can tell that it should write the whole array at once?
!but perhaps compiler allocates/frees temporary array?
write(UNIT) (/( arr(i), i=1,N )/)
会比以下更有效率:
!compiler does lots of IO here?
do i=1,N
write(UNIT) arr(i)
enddo
对于打开为:
的文件open(unit=UNIT,access='STREAM',file=fname,status='UNKNOWN')
有可能这将与编译器选项一起用于关闭缓冲写入...
答案 0 :(得分:3)
正如@HighPerformanceMark所建议的,这是我设置的简单基准:
使用gfortran:
program main
implicit none
include 'mpif.h'
integer, parameter :: N = 1000000
integer :: unit = 22
integer i
real*8 arr(N)
real*8 t1
integer repeat
external test1
external test2
external test3
repeat=15
call MPI_INIT(i)
arr = 0
call timeit(test1,repeat,arr,N,t1)
print*,t1/repeat
call timeit(test2,repeat,arr,N,t1)
print*,t1/repeat
call timeit(test3,repeat,arr,N,t1)
print*,t1/repeat
call MPI_Finalize(i)
end
subroutine timeit(sub,repeat,arr,size,time)
include 'mpif.h'
external sub
integer repeat
integer size
real*8 time,t1
real*8 arr(size)
integer i
time = 0
do i=1,repeat
open(unit=10,access='STREAM',file='test1',status='UNKNOWN')
t1 = mpi_wtime()
call sub(10,arr,size)
time = time + (mpi_wtime()-t1)
close(10)
enddo
return
end
subroutine test1(ou,a,N)
integer N
real*8 a(N)
integer ou
integer i
do i=1,N
write(ou),a(i)
enddo
return
end
subroutine test2(ou,a,N)
integer N
real*8 a(N)
integer ou
integer i
write(ou),(a(i),i=1,N)
return
end
subroutine test3(ou,a,N)
integer N
real*8 a(N)
integer ou
write(ou),a(1:N)
return
end
我的结果是(缓冲):
temp $ GFORTRAN_UNBUFFERED_ALL=1 mpirun -np 1 ./test
6.2392100652058922
3.3046503861745200
9.76902325948079409E-002
(未缓冲的):
temp $ GFORTRAN_UNBUFFERED_ALL=0 mpirun -np 1 ./test
2.7789104779561362
0.15584923426310221
9.82964992523193415E-002
答案 1 :(得分:1)
我使用gfortran(4.7.2 20120921)和ifort(13.0.0.079 Build 20120731)编译并运行上述基准代码。我的结果如下:
UNBUFFERED BUFFERED
test1: 1.2614487171173097 0.20308602650960286
test2: 1.0525423844655355 3.4633986155192059E-002
test3: 5.9630711873372398E-003 6.0543696085611975E-003
UNBUFFERED BUFFERED
test1: 1.33864809672038 0.171342913309733
test2: 6.001885732014974E-003 6.095488866170247E-003
test3: 5.962880452473959E-003 6.007925669352213E-003
在test1
中,显式循环似乎是两种情况下最不利的(没有设置任何优化标志)。此外,使用英特尔编译器,无论您运行write(ou), (a(i), i=1, N)
(案例2)还是write(ou), a(1:N)
(案例3,在这种情况下仅与write(ou), a
相同),执行时间都没有显着差异。 / p>
顺便说一下,对于这个单线程进程,您还可以使用fortran 90(或95?)内在子例程cpu_time
,它们对所有线程求和,并以秒为单位返回时间。否则还有system_clock
,它可以将经过的时钟周期数和时钟频率作为整数返回,可能会更高精度。