OpenMPI:ORTE无法可靠地启动一个或多个守护进程

时间:2017-04-11 00:39:40

标签: linux ubuntu openmpi

我已经好几天了,但无法解决我的问题。

我正在跑步: mpiexec -hostfile ~/machines -nolocal -pernode mkdir -p $dstpath其中$ dstpath指向当前目录,“machines”是包含以下内容的文件: node01 node02 node03 node04

这是错误输出:

Failed to parse XML input with the minimalistic parser. If it was not
generated by hwloc, try enabling full XML support with libxml2.
[node01:06177] [[6421,0],0] ORTE_ERROR_LOG: Error in file base/plm_base_launch_support.c at line 891
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--------------------------------------------------------------------------
[node01:06177] 1 more process has sent help message help-errmgr-base.txt / failed-daemon-launch
[node01:06177] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Failed to parse XML input with the minimalistic parser. If it was not
generated by hwloc, try enabling full XML support with libxml2.
[node01:06181] [[6417,0],0] ORTE_ERROR_LOG: Error in file base/plm_base_launch_support.c at line 891

我有4台机器,node01到node04。为了登录这4个节点,我必须先登录到node00。我正在尝试运行一些分布式图形函数。图形软件安装在node01中,并且应该使用mpiexec与其他节点同步。

我做了什么:

  1. 确保所有无密码登录都已设置,每台机器都可以ssh到任何其他没有问题的机器。

  2. 在主目录中有一个主机文件。

  3. echo $ PATH提供/home/myhome/bin:/home/myhome/.local/bin:/usr/include/openmpi:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

  4. echo $ LD_LIBRARY_PATH给出 /usr/lib/openmpi/lib

  5. 之前已经有过这样的工作,但它突然开始出现这些错误。我让我的管理员安装新机器,但它仍然给出了这样的错误。我试过一次做一个节点,但它给出了同样的错误。我根本不熟悉命令行,所以请给我一些建议。我尝试从源代码和sudo apt-get install openmpi-bin重新安装OpenMPI。我在Ubuntu 16.04 LTS上。

1 个答案:

答案 0 :(得分:-1)

你应该专注于修复:

  

无法使用minimalistic解析器解析XML输入。如果不是   由hwloc生成,尝试使用libxml2启用完整的XML支持。   [node01:06177] [[6421,0],0] ORTE_ERROR_LOG:文件库/ plm_base_launch_support.c第891行出错