我正在尝试构建一个基于Windows XP的简单集群。我成功编译了OpenMPI-1.4.2,像mpicc
和ompi_info
这样的工具也可以工作,但我无法使mpirun
正常工作。我能看到的唯一输出是
Z:\>orterun --hostfile z:\hosts.txt -np 2 hostname [host0:04728] Failed to initialize COM library. Error code = -2147417850 [host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\openmpi-1.4.2 \orte\mca\ess\hnp\ess_hnp_module.c at line 218 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_plm_init failed --> Returned value Error (-1) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\openmpi-1.4.2 \orte\runtime\orte_init.c at line 132 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_set_name failed --> Returned value Error (-1) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\..\..\openmpi -1.4.2\orte\tools\orterun\orterun.c at line 543
z:\hosts.txt
出现如下:
host0 host1
Z:
是host0和host1均可使用的共享网络驱动器。
我的问题是什么,我该如何解决?
UPD: 好的,这个问题似乎已得到解决。在我看来,WideCap驱动程序和/或软件组件会导致出现此错误。 “干净”的机器成功运行本地任务。无论如何,我仍然无法在至少2台机器上运行任务,我收到以下消息:
Z:\>mpirun --hostfile z:\hosts.txt -np 2 hostname connecting to host1 username:MAIN\cluster password:******** Save Credential?(Y/N) y [host0:04728] This feature hasn't been implemented yet. [host0:04728] Could not connect to namespace cimv2 on node host1. Error code =-2147217400 -------------------------------------------------------------------------- mpirun was unable to start the specified application as it encountered an error. More information may be available above. --------------------------------------------------------------------------
我用Google搜索了一下,完成了这里描述的所有事情:http://www.open-mpi.org/community/lists/users/2010/03/12355.php但我仍然遇到同样的错误。任何人都可以帮助我吗?
UPD2:
错误代码-2147217400可能是WMI错误WBEM_E_INVALID_PARAMETER (0x80041008)
,它在传递给WMI调用的其中一个参数不正确时发生。这是否意味着问题出在OpenMPI源代码本身?或者也许是因为我从源代码构建OpenMPI时使用的错误/过时wincred.h
和credui.lib
?