每当我尝试完成我的mpi程序时,我都会遇到与以下类似的错误。
[mpiexec] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:70): assert (!(pollfds[i].revents & ~POLLIN & ~POLLOUT & ~POLLHUP)) failed
[mpiexec] main (./pm/pmiserv/pmip.c:221): demux engine error waiting for event
[mpiexec] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:99): one of the processes terminated badly; aborting
[mpiexec] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned error waiting for completion
[mpiexec] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:352): bootstrap server returned error waiting for completion
[mpiexec] main (./ui/mpich/mpiexec.c:294): process manager error waiting for completion
有时,它会出现glibc“双重免费或损坏”错误。每个进程都是单线程的,每个进程肯定都在调用MPI_Finalize()。知道这里可能出现什么问题吗?
答案 0 :(得分:2)
我写了一个小测试程序,它应该在没有任何错误的情况下退出。请尝试运行它。如果它正常退出,则问题出在你的代码上。
#include <mpi.h>
#include <cstdio>
int main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int my_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
int finalize_retcode = MPI_Finalize();
if(0 == my_rank) fprintf(stderr, "Process, return_code\n");
fprintf(stderr, "%i, %i\n", my_rank, finalize_retcode);
return 0;
}
答案 1 :(得分:1)
我遇到了类似的问题。
MPI_Request* req = (MPI_Request*) malloc(sizeof(MPI_Request)*2*numThings*numItems);
int count;
for( item in items ) {
count = 0;
for( thing in things ) {
MPI_Irecv(<sendBufF>, 1, MPI_INT, <src>, <tag>, MPI_COMM_WORLD, &req[count++]);
MPI_Isend(<recvBufF>, 1, MPI_INT, <dest>, <tag>, MPI_COMM_WORLD, &req[count++]);
}
}
MPI_Status* stat = (MPI_Status*) malloc(sizeof(MPI_Status)*2*numThings*numItems);
MPI_Waitall(count, req, stat);
MPI_Waitall(...)
的调用的值为count
,小于Isend和recv的执行次数;这导致消息未被接收。在for循环外移动count=0
解决了MPI_Finalize(...)
错误。