我试图在4个节点上运行一个简单的MPI程序。我正在使用在Centos 5.5上运行的OpenMPI 1.4.3。当我使用hostfile / machinefile提交MPIRUN命令时,我没有输出,接收空白屏幕。因此,我必须杀死这份工作。。
我使用以下运行命令:: mpirun --hostfile hostfile -np 4 new46
OUTPUT ON KILLING JOB:
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process that caused
that situation.
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
myocyte46 - daemon did not report back when launched
myocyte47 - daemon did not report back when launched
myocyte49 - daemon did not report back when launched
这是我试图在4个节点上执行的MPI程序
**************************
if (my_rank != 0)
{
sprintf(message, "Greetings from the process %d!", my_rank);
dest = 0;
MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
}
else
{
for (source = 1;source < p; source++)
{
MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status);
printf("%s\n", message);
}
****************************
我的主机文件如下所示:
[amohan@myocyte48 ~]$ cat hostfile
myocyte46
myocyte47
myocyte48
myocyte49
*******************************
我在每个节点上独立运行上面的MPI程序,它编译并运行得很好。当我使用主机文件时,我有这个问题“守护进程在启动时没有报告”。我想弄清楚可能是什么问题。
谢谢!
答案 0 :(得分:1)
我认为这些行
myocyte46 - daemon did not report back when launched
很清楚 - 你在启动mpi守护进程或之后与它们通信时遇到了麻烦。所以你需要开始关注网络。你可以没有密码ssh到这些节点吗?你能回来吗?撇开MPI计划,你可以
mpirun -np 4 hostname
得到任何东西?