openMPI和Singularity软件容器的必要库?

时间:2017-06-26 16:36:08

标签: r openmpi hpc singularity-container

问题

要在HPC系统上部署singularity software containers

是否更好?
  1. 从主人那里复制,
  2. 绑定到主机,
  3. 或在bootstrap期间安装
  4. 相关的HPC库进入容器?如果策略1或2.通常是值得推荐的,我如何找出哪些库需要复制/绑定以及从哪里来?

    更好的解决方案是更好的易用性,更好的稳定性和解决方案的效率,或更好的解决方案的独立性和可重复性。

    到目前为止,我主要尝试了策略3.并依赖于安装哪些库的错误或警告消息。然而,那是不成功的。

    背景

    容器的最终目标是通过HPC系统上的openMPI并行运行R.并行运行的最小引导程序定义文件对我来说就是这样。

    Bootstrap: debootstrap
    OSVersion: xenial
    MirrorURL: http://archive.ubuntu.com/ubuntu/
    
    %post
      # add universe repository
      sed -i 's/main/main universe/g' /etc/apt/sources.list
    
      apt-get update    
      apt-get install -y --no-install-recommends r-base-dev libopenmpi-dev openmpi-bin
      apt-get clean
    
      # directory will be bound to host
      mkdir /etc/libibverbscd .d
    
      # Interface R and MPI
      R --slave -e 'install.packages("doMPI", repos="http://cloud.r-project.org/")'
    
    
    %runscript
      R -e "library(doMPI); cl <- startMPIcluster(count = 5); registerDoMPI(cl); foreach(i=1:5) %dopar% Sys.sleep(10); closeCluster(cl); mpi.quit()"
    

    有了这个,我可以执行

    singularity run -B /etc/libibverbs.d/:/etc/libibverbs.d/ test.img
    

    并获得一些警告信息,但(到目前为止)它的工作原理。警告:

    libibverbs: Warning: couldn't load driver 'ipathverbs': libipathverbs-rdmav2.so: cannot open shared object file: No such file or directory
    libibverbs: Warning: couldn't load driver 'mthca': libmthca-rdmav2.so: cannot open shared object file: No such file or directory
    libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
    --------------------------------------------------------------------------
    [[12293,2],0]: A high-performance Open MPI point-to-point messaging module
    was unable to find any relevant network interfaces:
    
    Module: OpenFabrics (openib)
      Host: ****
    
    Another transport will be used instead, although this may result in
    lower performance.
    --------------------------------------------------------------------------
    .
    .
    .
    [****:01978] 4 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
    [****:01978] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
    

    我已尝试安装包libipathverbs1libmthca1,这会使警告消息消失,但之后并行运行失败:

    An MPI process has executed an operation involving a call to the
    "fork()" system call to create a child process.  Open MPI is currently
    operating in a condition that could result in memory corruption or
    other system errors; your MPI job may hang, crash, or produce silent
    data corruption.  The use of fork() (or system() or other calls that
    create child processes) is strongly discouraged.
    
    The process that invoked fork was:
    
      Local host:          ****
      MPI_COMM_WORLD rank: 1
    
    If you are *absolutely sure* that your application will successfully
    and correctly survive a call to fork(), you may disable this warning
    by setting the mpi_warn_on_fork MCA parameter to 0.
    --------------------------------------------------------------------------
    > -------------------------------------------------------
    Child job 2 terminated normally, but 1 process returned
    a non-zero exit code.. Per user-direction, the job has been aborted.
    

    Here建议绑定相关的库,但我不确定它们中的哪一个或我需要的其他库,甚至不知道如何找到它(除了非常乏味的反复试验)。

1 个答案:

答案 0 :(得分:0)

根据the OMPI FAQ,你不能在使用IB时调用fork,除非fork直接跟随exec调用。我敢打赌你有另一个程序或库在你的代码中分叉,这会破坏OpenMPI。