我正在尝试创建两个2个节点之间的进程的主从配置。
Node1在Node2上产生N个进程。我的问题是,当生成的进程尝试与其父节点进行通信时。他们尝试连接到127.0.1.1 IP,这是分配给Node1的/ etc / hosts文件中的Node1的IP。
我的/ etc / hosts文件是这样的
Node1 / etc / hosts文件
127.0.0.1 localhost
127.0.1.1 node1
ip.node.2 node2
...
Node2 / etc / hosts文件
127.0.0.1 localhost
127.0.1.1 node2
ip.node.1 node1
...
这是我的错误
MPIR_Init_thread(506)............................:
MPID_Init(325)...................................: spawned process group was unable to connect back to the parent on port <tag#0$description#madx$port#60313$ifname#127.0.1.1$>
MPID_Comm_connect(191)...........................:
MPIDI_Comm_connect(834)..........................: Named port tag#0$description#madx$port#60313$ifname#127.0.1.1$ does not exist
MPIDI_Comm_connect(651)..........................:
MPIDI_Create_inter_root_communicator_connect(324): Connection timed out in 180 seconds
还有我的master.c代码
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
void main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int kernels, servers;
char hostname[256];
gethostname(hostname, 255);
//char nombre[10]; int longitud;
kernels = atoi(argv[1]);
servers = atoi(argv[2]);
MPI_Comm intercomm;
MPI_Info info[2];
MPI_Info_create(&info[0]);
MPI_Info_set(info[0], "hostfile", "host2.txt");
MPI_Info_create(&info[1]);
MPI_Info_set(info[1], "hostfile", "host2.txt");
char *cmds[2] = {"./kernel", "./server"};
int np[2] = {kernels, servers};
int errcodes[2];
MPI_Comm_spawn_multiple(2, cmds, MPI_ARGVS_NULL, np, info, 0, MPI_COMM_WORLD, &intercomm, errcodes);
MPI_Finalize();
}
host2.txt
host2:4