我正在将OpenMP改装为MPI Fortran计算流体动力学代码。
我正在使用线程漏斗方法。
截至目前,每次运行测试时,MPI + OpenMP代码运行速度都较慢(我为MPI + OpenMP版本使用了更多处理器,而且运行速度仍然较慢,我比较了2个CPU,2个MPI进程和6个CPU)有2个MPI进程和3个OpenMP线程。
我一直在使用gprof,我注意到当我启用OpenMP时,生活在代码序列部分的函数需要花费数千倍的时间。谁知道为什么会这样呢?
--- 2016年9月15日编辑 -
感谢大家的意见。
与此同时,我用TAU描述了我的代码并理解gprof将线程的空闲时间归因于随机函数。在TAU上,这个相同的空闲时间归因于" .TAU aplication"。我当前的问题,当我添加线程时,我看到边际加速(小于5%)。
@VladimirF我理解你的观点,抱歉没有清除它,但我比较2个MPI进程和1个线程,每个2个MPI进程,每个2个线程,所以我实际上是加倍资源,看不到加速< / p>
@ tim18我没有使用调试器进行检查,但是我检查了输出并且结果与机器精度相符
这是代码执行的特定循环。
!$OMP PARALLEL DO & ! Has masque, so should be dynamic
!$OMP& private (k,fa3, j, jp, fa2) &
!$OMP& shared (z3r,z4r,z2r,cur,cvr,cwr) &
!$OMP& schedule (runtime)
DO i=1, nL1 ! loop 1
DO k=1,nG3 ! loop 2
fa3=fac3(k)
DO j=1,nG2,2 ! loop 3a
IF(masque(j,k))THEN
jp=j+1
fa2=fac2(j)
!
uu(j,k,i) = fa3*cvr(jp,k,i)-fa2*cwr(jp,k,i)
uu(jp,k,i)=-fa3*cvr(j,k,i) +fa2*cwr(j,k,i)
vv(j,k,i) =-fa3*cur(jp,k,i)-vv(j,k,i)
vv(jp,k,i)= fa3*cur(j,k,i) -vv(jp,k,i)
ww(j,k,i) =ww(j,k,i) +fa2*cur(jp,k,i)
ww(jp,k,i)=ww(jp,k,i) -fa2*cur(j,k,i)
ENDIF
ENDDO ! loop 3a
DO j=1,nG2 ! loop 3b
IF(masque(j,k))THEN
z3r(j,k,i) = cur(j,k,i)
z4r(j,k,i) = cvr(j,k,i)
z2r(j,k,i) = cwr(j,k,i)
ENDIF
ENDDO ! loop 3b
ENDDO ! loop 2
ENDDO ! loop 1
!$OMP END PARALLEL DO
TAU的结果如下:
%Time Exclusive Inclusive #Call #Subrs Inclusive Name
msec total msec usec/call
2 MPI + 2 threads ---
MEAN
0.9 0.265 8,943 39 39 229330 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 0.106 17,887 78 78 229324 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 17,805 17,887 78 78 229323 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
0.0 82 82 78 0 1052 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
0.0 328 328 312 0 1052 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
TOTAL
0.9 1 35,775 156 156 229330 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 0.423 1:11.549 312 312 229324 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 1:11.220 1:11.548 312 312 229323 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
Node 0 Thread 0
1.8 0.547 17,919 78 78 229737 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 0.128 17,918 78 78 229730 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 17,863 17,918 78 78 229729 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
0.0 55 55 78 0 714 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
Node 0 Thread 1
1.8 0.116 17,919 78 78 229732 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 17,788 17,918 78 78 229730 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
0.0 130 130 78 0 1667 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
Node 1 Thread 0
1.8 0.511 17,856 78 78 228923 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 0.087 17,855 78 78 228917 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 17,788 17,918 78 78 229730 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
0.0 40 40 78 0 523 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
Node 1 Thread 1
1.8 0.092 17,855 78 78 228917 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 17,753 17,855 78 78 228916 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
0.0 101 101 78 0 1302 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
2 MPI + 1 thread ---
Node 0 Thread 0
2.0 0.273 20,345 78 78 260834 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
2.0 0.101 20,344 78 78 260831 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
2.0 20,344 20,344 78 78 260829 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
0.0 0.184 0.184 78 0 2 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
Node 0 Thread 1
1.9 0.261 20,113 78 78 257860 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
1.9 0.072 20,112 78 78 257856 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
1.9 20,112 20,112 78 78 257855 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
0.0 0.176 0.176 78 0 2 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
MEAN
1.9 0.267 20,229 78 78 259347 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
1.9 0.0865 20,228 78 78 259344 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
1.9 20,228 20,228 78 78 259342 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
0.0 0.18 0.18 78 0 2 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
TOTAL
1.9 0.534 40,458 156 156 259347 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
1.9 0.173 40,457 156 156 259344 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
1.9 40,457 40,457 156 156 259342 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
0.0 0.36 0.36 156 0 2 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
有一个边际加速,但它甚至不接近50%,问题不是开销。