我目前正在运行一个必须处理模型网格的程序。当我想使用例如运行程序时。作为工人的10个处理器(mpirun -np 11 -machinefile host civil_mpi.exe
),只有3个处理程序运行程序,其余处理程序在程序开始时停止而没有任何错误!
如果我减小模型网格的大小,一切都正常。机器的总RAM超过30 GB,每个进程所需的内存大小(基于模型网格大小)小于1 GB,因此理论上RAM应该没有问题。在这种情况下,有人可以帮助我吗?
操作系统是Linux OpenSuse,我在具有16个双核CPU的机器上运行MPI。代码是:
call MPI_INIT(ierror)
call mpi_comm_rank(MPI_COMM_WORLD, procid, ierror)
call mpi_comm_size(MPI_COMM_WORLD, nproc, ierror)
nworker = nproc - 1
call mpi_get_processor_name (procname, len, ierror)
n_slice = 280
ny0(1) = 1
ny(1) = 2
do i = 2,n_slice
ny0(i) = ny0(i-1) + 2
ny(i) = ny(i-1) + 2
end do
nx = 461
nx0 = 1
nz = 421
nz0 = 1
nwork = 1
do i = 1,280
if(nworker*nwork .lt. n_slice) then
nwork = nwork + 1
end if
end do
if (procid .eq. masterid) then
worker_job = 1
do q = 1,nworker
iwork = q
call mpi_send(worker_job, 1, MPI_INTEGER, iwork, tag,
$ MPI_COMM_WORLD,ierror)
call mpi_send(nx0, 1, MPI_INTEGER, iwork, tag,
$ MPI_COMM_WORLD,ierror)
call mpi_send(ny0, 280, MPI_INTEGER, iwork, tag,
$ MPI_COMM_WORLD, ierror)
call mpi_send(nz0, 1, MPI_INTEGER, iwork, tag,
$ MPI_COMM_WORLD,ierror)
call mpi_send(nx, 1, MPI_INTEGER, iwork, tag,
$ MPI_COMM_WORLD, ierror)
call mpi_send(ny, 280, MPI_INTEGER, iwork, tag,
$ MPI_COMM_WORLD,ierror)
call mpi_send(nz, 1, MPI_INTEGER, iwork, tag,
$ MPI_COMM_WORLD, ierror)
worker_job = worker_job + nwork
end do
end if
c ------------------ worker task -----------
if (procid .gt. masterid) then
c write(*,*)'processor',procid,'is working....'
call mpi_recv(worker_job, 1, MPI_INTEGER, masterid, tag,
$ MPI_COMM_WORLD, status, ierror)
call mpi_recv(nx0, 1, MPI_INTEGER, masterid, tag,
$ MPI_COMM_WORLD, status, ierror)
call mpi_recv(ny0, 280, MPI_INTEGER, masterid, tag,
$ MPI_COMM_WORLD, status, ierror)
call mpi_recv(nz0, 1, MPI_INTEGER, masterid, tag,
$ MPI_COMM_WORLD, status, ierror)
call mpi_recv(nx, 1, MPI_INTEGER, masterid, tag,
$ MPI_COMM_WORLD, status, ierror)
call mpi_recv(ny, 280, MPI_INTEGER, masterid, tag,
$ MPI_COMM_WORLD, status, ierror)
call mpi_recv(nz, 1, MPI_INTEGER, masterid, tag,
$ MPI_COMM_WORLD, status, ierror)
do j = worker_job, worker_job + nwork - 1
if (j .le. 280) then
write(*,*) '****************processor',procid,'is working'
call rawmig(j,nx0,ny0(j),nz0,nx,ny(j),nz)
end if
end do
end if
call mpi_finalize(ierror)
end
答案 0 :(得分:0)
问题解决了!谢谢大家的评论,最后我意识到主程序中的一个矩阵必须与来自处理器的新程序同步! Gilles Gouaillardet,我试图制作一个短程和可读的程序版本,在你的建议之后发布它,在那期间我看到这个矩阵的形式是用iy = ny0,ny(变化的Dimension)建立一个输出一定是iy = 1,2。但首先,定义中的矩阵维度必须已得到纠正,并且因为它是使用来自每个处理器的直接提交变量定义的,所以某些处理器在没有任何错误消息的情况下被停止!