使用3个以上的主机时出现OpenMPI错误

时间:2017-03-13 05:02:10

标签: python ssh mpi mpi4py

我运行了一个包含3个以上进程的简单MPI Python程序。 例如:

mpiexec -host master,w1,w2,w3 python code.py

导致错误:

ssh: Could not resolve hostname w3: Name or service not known
ORTE was unable to reliably start one or more daemons.

This usually is caused by:

* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.

* the inability to write startup files into /tmp (-tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).

但是,如果我用w1,w2,w3中的任意两个运行程序,它就可以了。 EX:

mpiexec -host master,w1,w3 python code.py

而且,这是代码

import random
import numpy as np
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.rank
size = comm.size

if rank ==0:
print rank, 'worker'
else:
print rank, 'worker'

我该如何解决?

0 个答案:

没有答案