MPI程序挂断了

时间:2015-03-16 07:42:17

标签: c linux ubuntu mpi mpich

我使用以下命令在我的Ubuntu 14.04笔记本电脑上安装了mpich2:

sudo apt-get install libcr-dev mpich2 mpich2-doc

这是我试图执行的代码:

#include <mpi.h>
#include <stdio.h>

int main()
{
    int myrank, size;
    MPI_Init(NULL, NULL);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    printf("Hello world! I am %d of %d\n", myrank, size);

    MPI_Finalize();
    return 0;
}

将其编译为mpicc helloworld.c不会出错。但是当我执行程序时:mpirun -np 5 ./a.out没有输出,程序只是继续执行,好像它处于无限循环中一样。按Ctrl + C,这就是我得到的:

$ mpirun -np 5 ./a.out                                                                                                                                                        
^C[mpiexec@user] Sending Ctrl-C to processes as requested
[mpiexec@user] Press Ctrl-C again to force abort
[mpiexec@user] HYDU_sock_write (./utils/sock/sock.c:291): write error (Bad file descriptor)
[mpiexec@user] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:170): unable to write data to proxy
[mpiexec@user] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:79): unable to send signal downstream
[mpiexec@user] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@user] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec@user] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion

我无法通过Google搜索获得任何解决方案。导致此错误的原因是什么?

1 个答案:

答案 0 :(得分:1)

我在两个计算节点上遇到了同样的问题:

$ mpirun -np 10 -ppn 5 --hosts c1,c2 ./a.out  
[mpiexec@c1] Press Ctrl-C again to force abort
[mpiexec@c1] HYDU_sock_write (utils/sock/sock.c:286): write error (Bad file descriptor)
[mpiexec@c1] HYD_pmcd_pmiserv_send_signal (pm/pmiserv/pmiserv_cb.c:169): unable to write data to proxy
[mpiexec@c1] ui_cmd_cb (pm/pmiserv/pmiserv_pmci.c:79): unable to send signal downstream
[mpiexec@c1] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@c1] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec@c1] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion

结果是 c1个节点无法SSH c2

如果仅使用一台计算机,则可以尝试使用fork作为启动器:

mpirun -launcher fork -np 5 ./a.out