我已经安装了openmpi,而不是/usr/...
,而是/commun/data/packages/openmpi/
,它是使用--with-sge
编译的。
我在http://docs.oracle.com/cd/E19080-01/n1.grid.eng6/817-5677/6ml49n2c0/index.html
中添加了SGE中的新PE# /commun/data/packages/openmpi/bin/ompi_info | grep gridengine
MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.3)
# qconf -sq all.q | grep pe_
pe_list make orte
没有SGE,使用多个处理器,程序运行没有任何问题。
/commun/data/packages/openmpi/bin/orterun -np 20 ./a.out args
现在我想将我的程序提交给SGE
在Open MPI FAQ中,我读到了:
# Allocate a SGE interactive job with 4 slots
# from a parallel environment (PE) named 'orte'
shell$ qsh -pe orte 4
但我的输出是:
qsh -pe orte 4
Your job 84550 ("INTERACTIVE") has been submitted
waiting for interactive job to be scheduled ...
Could not start interactive job.
我还尝试了脚本中嵌入的mpirun
命令:
$ cat ompi.sh
#!/bin/sh
/commun/data/packages/openmpi/bin/mpirun \
/path/to/a.out args
但它失败了
$ cat ompi.sh.e84552
error: executing task of job 84552 failed: execution daemon on host "node02" didn't accept task
--------------------------------------------------------------------------
A daemon (pid 18327) died unexpectedly with status 1 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
error: executing task of job 84552 failed: execution daemon on host "node01" didn't accept task
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
我该如何解决这个问题?
在openmpi邮件列表中回答:http://www.open-mpi.org/community/lists/users/2013/02/21360.php
答案 0 :(得分:0)
在我的情况下设置“job_is_first_task FALSE”和“control_slaves TRUE”解决了这个问题。
# qconf -mp mpi1
pe_name mpi1
slots 9
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE