混合MPI + OpenMP Fortran

时间:2016-09-12 11:24:56

标签: fortran mpi openmp hybrid

我正在将OpenMP改装为MPI Fortran计算流体动力学代码。

我正在使用线程漏斗方法。

截至目前,每次运行测试时,MPI + OpenMP代码运行速度都较慢(我为MPI + OpenMP版本使用了更多处理器,而且运行速度仍然较慢,我比较了2个CPU,2个MPI进程和6个CPU)有2个MPI进程和3个OpenMP线程。

我一直在使用gprof,我注意到当我启用OpenMP时,生活在代码序列部分的函数需要花费数千倍的时间。谁知道为什么会这样呢?

--- 2016年9月15日编辑 -

感谢大家的意见。

与此同时,我用TAU描述了我的代码并理解gprof将线程的空闲时间归因于随机函数。在TAU上,这个相同的空闲时间归因于" .TAU aplication"。我当前的问题,当我添加线程时,我看到边际加速(小于5%)。

@Zulan:与此同时,我通过请求更多的线程+进程(2MPI x 7个线程)来探测我的代码,而不是我有可用的CPU(双线程四核),通过这样做,我可以看到OpenMP障碍非常大执行的时间,所以我不相信这种情况。

@VladimirF我理解你的观点,抱歉没有清除它,但我比较2个MPI进程和1个线程,每个2个MPI进程,每个2个线程,所以我实际上是加倍资源,看不到加速< / p>

@ tim18我没有使用调试器进行检查,但是我检查了输出并且结果与机器精度相符

这是代码执行的特定循环。

!$OMP PARALLEL DO  & ! Has masque, so should be dynamic
!$OMP& private (k,fa3, j, jp, fa2) &
!$OMP& shared (z3r,z4r,z2r,cur,cvr,cwr) &
!$OMP& schedule (runtime)
DO i=1, nL1     ! loop 1
   DO k=1,nG3     ! loop 2
      fa3=fac3(k)
      DO j=1,nG2,2   ! loop 3a
         IF(masque(j,k))THEN
            jp=j+1   
            fa2=fac2(j)   
            !
            uu(j,k,i) = fa3*cvr(jp,k,i)-fa2*cwr(jp,k,i)
            uu(jp,k,i)=-fa3*cvr(j,k,i) +fa2*cwr(j,k,i)

            vv(j,k,i) =-fa3*cur(jp,k,i)-vv(j,k,i)
            vv(jp,k,i)= fa3*cur(j,k,i) -vv(jp,k,i)

            ww(j,k,i) =ww(j,k,i)  +fa2*cur(jp,k,i)
            ww(jp,k,i)=ww(jp,k,i) -fa2*cur(j,k,i)
         ENDIF
      ENDDO     ! loop 3a

      DO j=1,nG2  ! loop 3b
        IF(masque(j,k))THEN
          z3r(j,k,i) = cur(j,k,i)
          z4r(j,k,i) = cvr(j,k,i)
          z2r(j,k,i) = cwr(j,k,i)
        ENDIF
      ENDDO     ! loop 3b
   ENDDO   ! loop 2
ENDDO   ! loop 1
!$OMP END PARALLEL DO

TAU的结果如下:

%Time    Exclusive    Inclusive       #Call      #Subrs  Inclusive Name
              msec   total msec                          usec/call 


2 MPI + 2 threads --- 

MEAN
  0.9        0.265        8,943          39          39     229330 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  1.8        0.106       17,887          78          78     229324 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  1.8       17,805       17,887          78          78     229323 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  0.0           82           82          78           0       1052 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  0.0          328          328         312           0       1052 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]


TOTAL 
  0.9            1       35,775         156         156     229330 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  1.8        0.423     1:11.549         312         312     229324 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  1.8     1:11.220     1:11.548         312         312     229323 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]

Node 0 Thread 0 
  1.8        0.547       17,919          78          78     229737 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  1.8        0.128       17,918          78          78     229730 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  1.8       17,863       17,918          78          78     229729 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  0.0           55           55          78           0        714 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
Node 0 Thread 1 
  1.8        0.116       17,919          78          78     229732 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  1.8       17,788       17,918          78          78     229730 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  0.0          130          130          78           0       1667 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]

Node 1 Thread 0 
  1.8        0.511       17,856          78          78     228923 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  1.8        0.087       17,855          78          78     228917 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  1.8       17,788       17,918          78          78     229730 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  0.0           40           40          78           0        523 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
Node 1 Thread 1 
  1.8        0.092       17,855          78          78     228917 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  1.8       17,753       17,855          78          78     228916 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
  0.0          101          101          78           0       1302 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]


2 MPI + 1 thread ---

Node 0 Thread 0 
  2.0        0.273       20,345          78          78     260834 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
  2.0        0.101       20,344          78          78     260831 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
  2.0       20,344       20,344          78          78     260829 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
  0.0        0.184        0.184          78           0          2 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]

Node 0 Thread 1 
  1.9        0.261       20,113          78          78     257860 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
  1.9        0.072       20,112          78          78     257856 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
  1.9       20,112       20,112          78          78     257855 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
  0.0        0.176        0.176          78           0          2 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]

MEAN
  1.9        0.267       20,229          78          78     259347 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
  1.9       0.0865       20,228          78          78     259344 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
  1.9       20,228       20,228          78          78     259342 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
  0.0         0.18         0.18          78           0          2 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]


TOTAL
  1.9        0.534       40,458         156         156     259347 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
  1.9        0.173       40,457         156         156     259344 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
  1.9       40,457       40,457         156         156     259342 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
  0.0         0.36         0.36         156           0          2 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]

有一个边际加速,但它甚至不接近50%,问题不是开销。

0 个答案:

没有答案