我有一个混合的mpi-openmp代码崩溃,出现分段错误,终止错误。我使用mpif90 / ifort编译,并使用mpich2。这是我使用的编译行并得到错误:
mpif90.mpich2 -f90=ifort -DAMD64_LNX -openmp -o jack_openmp.exe laplace.f
使用此命令时,如果我从一个节点运行,指向包含不同节点的计算机文件,则会出现分段错误:
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
但是,如果我从特定节点(比如node1)运行它并且在机器文件中只有“node1”,那么它将按预期运行,使用每个节点的适当线程数量(比如说“node1”是否列出了两次在机器文件中,mpiexec命令就像“mpiexec -np 2 ...”)。
我尝试的第二件事是代替链接“-openmp”,我链接“-liomp5”。当我这样做时,代码编译并运行,甚至跨节点。但是它不会在任何螺纹意义上运行。 “omp_get_num_threads”将返回每个节点8个线程(这是正确的),但它只会在machines文件中列出的每个节点运行一个线程,因此不会进行任何实际的线程。
我正在使用最新的ifort编译器(12.1.2)和mpich2。堆栈大小是无限的,通过“ulimit -a”验证并将其视为无限制。
laplace.f文件的源代码如下:
program lpmlp
include 'mpif.h'
include "omp_lib.h"
integer imax,jmax,im1,im2,jm1,jm2,it,itmax
parameter (imax=10001,jmax=10001)
parameter (im1=imax-1,im2=imax-2,jm1=jmax-1,jm2=jmax-2)
parameter (itmax=100)
real*8 u(imax,jmax),du(imax,jmax),umax,dumax,tol,pi
parameter (umax=10.0,tol=1.0e-6,pi=3.14159)
! Additional MPI parameters
integer istart,iend,jstart,jend
integer size,rank,ierr,istat(MPI_STATUS_SIZE),mpigrid,length
integer grdrnk,dims(1),gloc(1),up,down,isize,jsize
integer ureq,dreq
integer ustat(MPI_STATUS_SIZE),dstat(MPI_STATUS_SIZE)
real*8 tstart,tend,gdumax
logical cyclic(1)
real*8 uibuf(imax),uobuf(imax),dibuf(imax),dobuf(imax)
! OpenMP parameters
integer nthrds,nthreads
! Initialize
call MPI_INIT_THREAD(MPI_THREAD_FUNNELED,IMPI_prov,ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
! 1D linear topology
dims(1)=size
cyclic(1)=.FALSE.
call MPI_CART_CREATE(MPI_COMM_WORLD,1,dims,cyclic,.true.,mpigrid
+ ,ierr)
call MPI_COMM_RANK(mpigrid,grdrnk,ierr)
call MPI_CART_COORDS(mpigrid,grdrnk,1,gloc,ierr)
call MPI_CART_SHIFT(mpigrid,0,1,down,up,ierr)
istart=2
iend=imax-1
jsize=jmax/size
jstart=gloc(1)*jsize+1
if (jstart.LE.1) jstart=2
jend=(gloc(1)+1)*jsize
if (jend.GE.jmax) jend=jmax-1
nthrds=OMP_GET_NUM_PROCS()
print*,"Rank=",rank,"Threads=",nthrds
call omp_set_num_threads(nthrds)
!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j)
! Initialize -- done in parallel to force "first-touch" distribution
! on ccNUMA machines (i.e. O2k)
!$OMP DO
do j=jstart-1,jend+1
do i=istart-1,iend+1
u(i,j)=0.0
du(i,j)=0.0
enddo
u(imax,j)=umax*sin(pi*float(j-1)/float(jmax-1))
enddo
!$OMP END DO
!$OMP END PARALLEL
! Main computation loop
call MPI_BARRIER(MPI_COMM_WORLD,ierr)
tstart=MPI_WTIME()
do it=1,itmax
! We have to keep the OpenMP and MPI calls segregated...
call omp_set_num_threads(nthrds)
!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j)
!$OMP MASTER
dumax=0.0
!$OMP END MASTER
!$OMP DO REDUCTION(max:dumax)
do j=jstart,jend
do i=istart,iend
!nthreads = OMP_GET_NUM_THREADS()
!print*,"Jack",rank,nthreads,nthrds
du(i,j)=0.25*(u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i,j+1))-u(i,j)
dumax=max(dumax,abs(du(i,j)))
enddo
enddo
!$OMP END DO
!$OMP DO
do j=jstart,jend
do i=istart,iend
u(i,j)=u(i,j)+du(i,j)
enddo
enddo
!$OMP END DO
!$OMP END PARALLEL
! Compute the overall residual
call MPI_REDUCE(dumax,gdumax,1,MPI_REAL8,MPI_MAX,0
+ ,MPI_COMM_WORLD,ierr)
! Send phase
if (down.NE.MPI_PROC_NULL) then
j=1
do i=istart,iend
dobuf(j)=u(i,jstart)
j=j+1
enddo
length=j-1
call MPI_ISEND(dobuf,length,MPI_REAL8,down,it,mpigrid,
+ dreq,ierr)
endif
if (up.NE.MPI_PROC_NULL) then
j=1
do i=istart,iend
uobuf(j)=u(i,jend)
j=j+1
enddo
length=j-1
call MPI_ISEND(uobuf,length,MPI_REAL8,up,it,mpigrid,
+ ureq,ierr)
endif
! Receive phase
if (down.NE.MPI_PROC_NULL) then
length=iend-istart+1
call MPI_RECV(dibuf,length,MPI_REAL8,down,it,
+ mpigrid,istat,ierr)
call MPI_WAIT(dreq,dstat,ierr)
j=1
do i=istart,iend
u(i,jstart-1)=dibuf(j)
j=j+1
enddo
endif
if (up.NE.MPI_PROC_NULL) then
length=iend-istart+1
call MPI_RECV(uibuf,length,MPI_REAL8,up,it,
+ mpigrid,istat,ierr)
call MPI_WAIT(ureq,ustat,ierr)
j=1
do i=istart,iend
u(i,jend+1)=uibuf(j)
j=j+1
enddo
endif
write (rank+10,*) rank,it,dumax,gdumax
if (rank.eq.0) write (1,*) it,gdumax
enddo
call MPI_BARRIER(MPI_COMM_WORLD,ierr)
tend=MPI_WTIME()
if (rank.EQ.0) then
write(*,*) 'Calculation took ',tend-tstart,'s. on ',size,
+ ' MPI processes'
+ ,' with ',nthrds,' OpenMP threads per process'
endif
call MPI_FINALIZE(ierr)
stop
end
在运行编译时链接的-liomp5时,可以看到:
call omp_set_num_threads(nthrds)
使用nthrds = 8执行,通过print语句验证,但是当通过以下方式立即检查时
nthreads = OMP_GET_NUM_THREADS()
结果nthreads = 1.但是,如果在编译时在-openmp中链接完成(机器文件中的所有相同节点,从同一节点开始运行),nthreads = 8。
如果我在计算机文件中首先指定headnode名称,请说:
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
[proxy:0:1@c403] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
[proxy:0:1@c403] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1@c403] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
[mpiexec@c403] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
[mpiexec@c403] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@c403] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:189): launcher returned error waiting for completion
[mpiexec@c403] main (./ui/mpich/mpiexec.c:397): process manager error waiting for completion
很多信息,但希望不要太多。谢谢你的帮助。
答案 0 :(得分:1)
可能是OpenMP线程堆栈大小太小。您是否尝试过设置大尺寸的OMP_STACKSIZE
?
% export OMP_STACKSIZE=512m # may be another value: 32m, 64m, 128m, 256m ...
每个OpenMP线程使用私有堆栈内存,IA-32中的默认堆栈大小为2MB,Intel64架构中的默认堆栈大小为4MB。
答案 1 :(得分:0)
尝试在Valgrind下运行您的程序(确保首先使用-g
重新编译调试符号)。这样的事情可能会有所帮助:
% mpiexec -n 2 valgrind -q ./jack_openmp.exe
如果它报告程序中的任何警告/错误(而不仅仅是一些随机系统库),那么您的代码中几乎肯定存在需要修复的错误。看一下指示的堆栈跟踪,找出Valgrind抱怨的原因。