我一直在测试简单Fortran代码的OpenMP加速,1)并行区域内部子程序调用和2)内部子程序内部的并行区域初始化。在这两种情况下,openmp do循环都放在内部子程序中。这是一个简单的代码:
module module_with_subroutine
include "omp_lib.h"
integer ,parameter :: rkind = selected_real_kind(15,307)
real(rkind) ,dimension(100,100,100) :: A
real(rkind) :: elapsed_time
integer :: clock_start ,clock_end ,clock_rate
contains
subroutine module_subprogram
A = 3.14_rkind
call SYSTEM_CLOCK(count_rate = clock_rate)
call SYSTEM_CLOCK(count = clock_start)
!!$omp parallel num_threads(12) !!! For case 1
call intrinsic_subprogram
!!$omp end parallel
call SYSTEM_CLOCK(count = clock_end)
elapsed_time = (clock_end-clock_start)/real(clock_rate,rkind)
print *, 'Elapsed time in seconds: ',elapsed_time
print *, 'Clock start: ',clock_start
print *, 'Clock end: ',clock_end
print *, 'Clock rate: ',clock_rate
contains
subroutine intrinsic_subprogram
integer :: i, j, k, steps ,nthread
do steps = 1,10000
!$omp parallel num_threads(12) !!! For case 2
!$omp do collapse(3) private (i,j,k,nthread)
do k = 1,100
do j = 1,100
do i = 1,100
nthread = omp_get_thread_num()
A(i,j,k) = (exp(A(i,j,k)**3.14 + sqrt(A(i,j,k)**3.14) + log(A(i,j,k)**3.14)))**1.414
if(A(i,j,k) <= 3.14) then
A(i,j,k) = 1.0
end if
!print *, 'thread number',nthread
!print *, 'i,j,k',i,j,k
end do
end do
end do
!$omp end do
!$omp end parallel
end do
end subroutine
end subroutine
end module
奇怪的是,第二种情况总是比第一种情况快一点,尽管在步骤循环内有多个并行区域初始化。也许有人可以解释这种行为?我是OpenMP编程的新手,也许我误解了OpenMP线程分叉技术。提前谢谢!