MPICH:MPI_COMM_SPAWN生成后的子节点无法到达父节点

时间:2019-04-15 12:09:29

标签: c mpi mpich

我正在尝试创建两个2个节点之间的进程的主从配置。

Node1在Node2上产生N个进程。我的问题是,当生成的进程尝试与其父节点进行通信时。他们尝试连接到127.0.1.1 IP,这是分配给Node1的/ etc / hosts文件中的Node1的IP。

我的/ etc / hosts文件是这样的

Node1 / etc / hosts文件

127.0.0.1  localhost
127.0.1.1  node1
ip.node.2  node2
...

Node2 / etc / hosts文件

127.0.0.1  localhost
127.0.1.1  node2
ip.node.1  node1
...

这是我的错误

MPIR_Init_thread(506)............................: 
MPID_Init(325)...................................: spawned process group was unable to connect back to the parent on port <tag#0$description#madx$port#60313$ifname#127.0.1.1$>
MPID_Comm_connect(191)...........................: 
MPIDI_Comm_connect(834)..........................: Named port tag#0$description#madx$port#60313$ifname#127.0.1.1$ does not exist
MPIDI_Comm_connect(651)..........................: 
MPIDI_Create_inter_root_communicator_connect(324): Connection timed out in 180 seconds

还有我的master.c代码

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
void main(int argc, char *argv[])
{
    MPI_Init(&argc, &argv);
    int kernels, servers;
    char hostname[256];
    gethostname(hostname, 255);
    //char nombre[10]; int longitud;
    kernels = atoi(argv[1]);
    servers = atoi(argv[2]);

    MPI_Comm intercomm;
    MPI_Info info[2];

    MPI_Info_create(&info[0]);
    MPI_Info_set(info[0], "hostfile", "host2.txt");
    MPI_Info_create(&info[1]);
    MPI_Info_set(info[1], "hostfile", "host2.txt");

    char *cmds[2] = {"./kernel", "./server"};
    int np[2] = {kernels, servers};
    int errcodes[2];
    MPI_Comm_spawn_multiple(2, cmds, MPI_ARGVS_NULL, np, info, 0, MPI_COMM_WORLD, &intercomm, errcodes);

    MPI_Finalize();
}

host2.txt

host2:4

0 个答案:

没有答案