要在HPC系统上部署singularity software containers,
是否更好?相关的HPC库进入容器?如果策略1或2.通常是值得推荐的,我如何找出哪些库需要复制/绑定以及从哪里来?
更好的解决方案是更好的易用性,更好的稳定性和解决方案的效率,或更好的解决方案的独立性和可重复性。
到目前为止,我主要尝试了策略3.并依赖于安装哪些库的错误或警告消息。然而,那是不成功的。
容器的最终目标是通过HPC系统上的openMPI并行运行R.并行运行的最小引导程序定义文件对我来说就是这样。
Bootstrap: debootstrap
OSVersion: xenial
MirrorURL: http://archive.ubuntu.com/ubuntu/
%post
# add universe repository
sed -i 's/main/main universe/g' /etc/apt/sources.list
apt-get update
apt-get install -y --no-install-recommends r-base-dev libopenmpi-dev openmpi-bin
apt-get clean
# directory will be bound to host
mkdir /etc/libibverbscd .d
# Interface R and MPI
R --slave -e 'install.packages("doMPI", repos="http://cloud.r-project.org/")'
%runscript
R -e "library(doMPI); cl <- startMPIcluster(count = 5); registerDoMPI(cl); foreach(i=1:5) %dopar% Sys.sleep(10); closeCluster(cl); mpi.quit()"
有了这个,我可以执行
singularity run -B /etc/libibverbs.d/:/etc/libibverbs.d/ test.img
并获得一些警告信息,但(到目前为止)它的工作原理。警告:
libibverbs: Warning: couldn't load driver 'ipathverbs': libipathverbs-rdmav2.so: cannot open shared object file: No such file or directory
libibverbs: Warning: couldn't load driver 'mthca': libmthca-rdmav2.so: cannot open shared object file: No such file or directory
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
--------------------------------------------------------------------------
[[12293,2],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: ****
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
.
.
.
[****:01978] 4 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[****:01978] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
我已尝试安装包libipathverbs1
和libmthca1
,这会使警告消息消失,但之后并行运行失败:
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
The process that invoked fork was:
Local host: ****
MPI_COMM_WORLD rank: 1
If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
> -------------------------------------------------------
Child job 2 terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
Here建议绑定相关的库,但我不确定它们中的哪一个或我需要的其他库,甚至不知道如何找到它(除了非常乏味的反复试验)。
答案 0 :(得分:0)
根据the OMPI FAQ,你不能在使用IB时调用fork,除非fork直接跟随exec调用。我敢打赌你有另一个程序或库在你的代码中分叉,这会破坏OpenMPI。