我遇到一种情况,即所有进程/节点都需要多次通信。为了最大程度地减少建立连接的开销,我尝试在每个节点上模拟一个MPI Server,然后每个节点在启动过程中都将与其他每个节点的连接创建为客户端。建立连接后,节点将使用这些连接相互通信。
这是我的工作流程:
我的问题出在步骤4中。我能够获取其他进程的端口名,但是当我调用MPI_Comm_connect()时,MPI抛出如下错误:
两个具有相同[[WILDCARD],WILDCARD]和标签300的接收器-正在中止
我以正确的方式解决问题吗?如果是,请有人帮我解决此问题。如果否,您能提示我正确的方向吗?
如果有帮助,这里是一些我正在使用的初始代码段。
void MPIServer::startServer() {
MPI_Open_port(MPI_INFO_NULL, port_name);
char name[DOMP_MAX_CLIENT_NAME];
snprintf(name, DOMP_MAX_CLIENT_NAME, "%s-%d", clusterName, rank);
MPI_Publish_name(name, MPI_INFO_NULL, port_name);
log("Server %s for node %d available at %s\n", name, rank, port_name);
// First start your own server thread
serverThread = std::thread(&MPIServer::accept, this);
// Wait for all nodes to start their server first
MPI_Barrier(MPI_COMM_WORLD);
// Now create connection to all other threads
char c_port_name[MPI_MAX_PORT_NAME];
for (int i = 0; i < clusterSize; i++) {
if (i != rank) {
snprintf(name, DOMP_MAX_CLIENT_NAME, "%s-%d", clusterName, i);
MPI_Lookup_name(name, MPI_INFO_NULL, c_port_name);
log("Node %d connecting to client %s at port %s\n", rank, name, c_port_name);
MPI_Comm_connect(c_port_name, MPI_INFO_NULL, 0, MPI_COMM_SELF, &nodeConnections[i]);
}
}
}
// Here is the accept method
void MPIServer::accept() {
while (true) {
MPI_Comm *client = new MPI_Comm();
MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_SELF, client);
log("Node %d received a request\b", rank);
// Handle in a new thread
// TODO: Consider using a threadpool instead of spawning thread every time
std::thread(&MPIServer::handleRequest, this, client);
}
}
这是我正在使用的MPI版本:
-bash-4.2$ ompi_info
Package: Open MPI mockbuild@x86-041.build.eng.bos.redhat.com
Distribution
Open MPI: 1.10.7
Open MPI repo revision: v1.10.6-48-g5e373bf
Open MPI release date: May 16, 2017
Open RTE: 1.10.7
Open RTE repo revision: v1.10.6-48-g5e373bf
Open RTE release date: May 16, 2017
OPAL: 1.10.7
OPAL repo revision: v1.10.6-48-g5e373bf
OPAL release date: May 16, 2017
MPI API: 3.0.0
Ident string: 1.10.7