当我尝试在程序中调用多个MPI_Send或MPI_Recv时,可执行文件在节点和根目录中被挂起。即,当它试图执行第二MPI_Send或MPI_Recv时,通信被阻止。同时,二进制文件在机器中以100%运行。
当我尝试在Windows 7 64位中使用OpenMPI 1.6.3 64位运行此代码时,它成功运行。但是相同的代码在Linux中不起作用,即具有OpenMPI 1.6.3 -64位的CentOS 6.3 x86_64。我做了什么问题。
发布以下代码
#include <mpi.h>
int main(int argc, char** argv) {
MPI::Init();
int rank = MPI::COMM_WORLD.Get_rank();
int size = MPI::COMM_WORLD.Get_size();
char name[256] = { };
int len = 0;
MPI::Get_processor_name(name, len);
printf("Hi I'm %s:%d\n", name, rank);
if (rank == 0)
{
while (size >= 1)
{
int val, stat = 1;
MPI::Status status;
MPI::COMM_WORLD.Recv(&val, 1, MPI::INT, 1, 0, status);
int source = status.Get_source();
printf("%s:%d received %d from %d\n", name, rank, val, source);
MPI::COMM_WORLD.Send(&stat, 1, MPI::INT, 1, 2);
printf("%s:%d sent status %d\n", name, rank, stat);
size--;
}
} else
{
int val = rank + 10;
int stat = 0;
printf("%s:%d sending %d...\n", name, rank, val);
MPI::COMM_WORLD.Send(&val, 1, MPI::INT, 0, 0);
printf("%s:%d sent %d\n", name, rank, val);
MPI::Status status;
MPI::COMM_WORLD.Recv(&stat, 1, MPI::INT, 0, 2, status);
int source = status.Get_source();
printf("%s:%d received status %d from %d\n", name, rank, stat, source);
}
size = MPI::COMM_WORLD.Get_size();
if (rank == 0)
{
while (size >= 1)
{
int val, stat = 1;
MPI::Status status;
MPI::COMM_WORLD.Recv(&val, 1, MPI::INT, 1, 1, status);
int source = status.Get_source();
printf("%s:0 received %d from %d\n", name, val, source);
size--;
}
printf("all workers checked in!\n");
}
else
{
int val = rank + 10 + 5;
printf("%s:%d sending %d...\n", name, rank, val);
MPI::COMM_WORLD.Send(&val, 1, MPI::INT, 0, 1);
printf("%s:%d sent %d\n", name, rank, val);
}
MPI::Finalize();
return 0;
}
嗨,Hristo,我已按照您的说法更改了源代码并且代码再次发布
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv)
{
int iNumProcess = 0, iRank = 0, iNameLen = 0, n;
char szNodeName[MPI_MAX_PROCESSOR_NAME] = {};
MPI_Status stMPIStatus;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &iNumProcess);
MPI_Comm_rank(MPI_COMM_WORLD, &iRank);
MPI_Get_processor_name(szNodeName, &iNameLen);
printf("Hi I'm %s:%d\n", szNodeName, iRank);
if (iRank == 0)
{
int iNode = 1;
while (iNumProcess > 1)
{
int iVal = 0, iStat = 1;
MPI_Recv(&iVal, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &stMPIStatus);
printf("%s:%d received %d\n", szNodeName, iRank, iVal);
MPI_Send(&iStat, 1, MPI_INT, iNode, 1, MPI_COMM_WORLD);
printf("%s:%d sent Status %d\n", szNodeName, iRank, iStat);
MPI_Recv(&iVal, 1, MPI_INT, MPI_ANY_SOURCE, 2, MPI_COMM_WORLD, &stMPIStatus);
printf("%s:%d received %d\n", szNodeName, iRank, iVal);
iNumProcess--;
iNode++;
}
printf("all workers checked in!\n");
}
else
{
int iVal = iRank + 10;
int iStat = 0;
printf("%s:%d sending %d...\n", szNodeName, iRank, iVal);
MPI_Send(&iVal, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
printf("%s:%d sent %d\n", szNodeName, iRank, iVal);
MPI_Recv(&iStat, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &stMPIStatus);
printf("%s:%d received status %d\n", szNodeName, iRank, iVal);
iVal = 20;
printf("%s:%d sending %d...\n", szNodeName, iRank, iVal);
MPI_Send(&iVal, 1, MPI_INT, 0, 2, MPI_COMM_WORLD);
printf("%s:%d sent %d\n", szNodeName, iRank, iVal);
}
MPI_Finalize();
return 0;
}
我得到的输出如下。即,在发送发送/接收之后,root无限等待,并且节点以100%的CPU利用率进行破坏。它的输出低于
Hi I'm N1433:1
N1433:1 sending 11...
Hi I'm N1425:0
N1425:0 received 11
N1425:0 sent Status 1
N1433:1 sent 11
N1433:1 received status 11
N1433:1 sending 20...
这里N1433和N1425是机器名称。请帮忙
答案 0 :(得分:2)
主人的代码是错误的。它总是发送和等待来自相同等级的消息 - 等级1
。因此,如果以mpiexec -np 2 ...
运行,程序将只能正常运行。您可能想要做的是使用MPI_ANY_SOURCE
作为源排名,然后使用该源排名作为发送操作中的目标。您也不应该使用while (size >= 1)
,因为排名0
没有与自己交谈,预计通信次数会少于size
。
if (rank == 0)
{
while (size > 1)
// ^^^^^^^^
{
int val, stat = 1;
MPI::Status status;
MPI::COMM_WORLD.Recv(&val, 1, MPI::INT, MPI_ANY_SOURCE, 0, status);
// Use wildcard source here ------------^^^^^^^^^^^^^^
int source = status.Get_source();
printf("%s:%d received %d from %d\n", name, rank, val, source);
MPI::COMM_WORLD.Send(&stat, 1, MPI::INT, source, 2);
// Send back to the same process --------^^^^^^
printf("%s:%d sent status %d\n", name, rank, stat);
size--;
}
} else
在工人身上做这样的事情毫无意义:
MPI::Status status;
MPI::COMM_WORLD.Recv(&stat, 1, MPI::INT, 0, 2, status);
// Source rank is fixed here ------------^
int source = status.Get_source();
printf("%s:%d received status %d from %d\n", name, rank, stat, source);
您已在接收操作中指定等级0
作为来源,因此它只能接收来自等级0
的消息。除status.Get_source()
之外,0
无法返回除MPI::COMM_WORLD.Recv()
之外的任何值,除非发生了一些通信错误,在这种情况下,Boost.MPI
会抛出异常。
代码中的第二个循环也是如此。
顺便说一句,您使用的是以前的官方标准C ++绑定。它们在MPI-2.2中被弃用,最新版本的标准(MPI-3.0)完全删除了它们,因为MPI论坛不再支持它们。您应该使用C绑定,或者依赖于{{1}}等第三方C ++接口。
答案 1 :(得分:1)
安装和MPICH2而不是OpenMPI后,它成功运行。我认为在我的集群机器中使用OpenMPI 1.6.3存在一些问题。