MPI在群集的多个节点上运行代码时出错

时间:2014-05-27 13:51:43

标签: c centos mpi cluster-computing hpc

我正在使用一个4节点集群,每个节点上有12个核心和一个主节点。我在主节点上编译了代码gerris并在所有节点上复制了库。当我在一个节点上运行并行作业时,它工作正常,但是当我在多个节点上运行作业时,它会显示错误:

PE 9 (compute-0-1.local): error when cleaning up /tmp/2547.1.all.q/gfsB80OGX

mpirun注意到节点compute-0-1.local上的PID 25128进程等级4退出信号11(分段错误)

之前的代码是使用网站上提供的预构建软件包编译的,即使在多个节点上也可以完美地工作,但现在我在代码中进行了一些更改并使用了开发人员版本并从头开始编译。 任何人都可以建议如何调试此错误。我无法在谷歌上找到关于此错误的任何内容。 更新1:6月1日 这是我尝试GDB时遇到的错误。

warning: .dynamic section for "/lib64/libm.so.6" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib64/libgthread-2.0.so.0" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib64/libgmodule-2.0.so.0" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib64/libdl.so.2" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib64/libglib-2.0.so.0" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib64/libnsl.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib64/libutil.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib64/libgcc_s.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib64/libpthread.so.0" is not at the expected address (wrong library or version mismatch?)

warning: .dynamic section for "/lib64/libc.so.6" is not at the expected address (wrong library or version mismatch?)

warning: .dynamic section for "/lib64/librt.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib64/ld-linux-x86-64.so.2" is not at the expected address (wrong library or version mismatch?)

warning: .dynamic section for "/usr/lib64/librdmacm.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/usr/lib64/libibverbs.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations
Error while mapping shared library sections:
/tmp/2563.1.all.q/gfsC0YVGX/module.so: No such file or directory.
Error while mapping shared library sections:
/tmp/2563.1.all.q/gfsLB2CGX/module.so: No such file or directory.
Error while mapping shared library sections:
/tmp/2563.1.all.q/gfsSNFHGX/module.so: No such file or directory.
Error while mapping shared library sections:
/tmp/2563.1.all.q/gfsW7MLGX/module.so: No such file or directory.
Reading symbols from /opt/gerris/lib/libgfs3D-1.3.so.2...(no debugging symbols found)...done.
Loaded symbols for /opt/gerris/lib/libgfs3D-1.3.so.2
Reading symbols from /opt/gerris/lib/libgts-0.7.so.5...done.
Loaded symbols for /opt/gerris/lib/libgts-0.7.so.5
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libgthread-2.0.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgthread-2.0.so.0
Reading symbols from /lib64/libgmodule-2.0.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgmodule-2.0.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libglib-2.0.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib64/libglib-2.0.so.0
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/libmpi.so.0...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/libmpi.so.0
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/libopen-rte.so.0...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/libopen-rte.so.0
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/libopen-pal.so.0...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/libopen-pal.so.0
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libutil.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /opt/intel/lib/intel64/libimf.so...(no debugging symbols found)...done.
Loaded symbols for /opt/intel/lib/intel64/libimf.so
Reading symbols from /opt/intel/lib/intel64/libsvml.so...(no debugging symbols found)...done.
Loaded symbols for /opt/intel/lib/intel64/libsvml.so
Reading symbols from /opt/intel/lib/intel64/libintlc.so.5...(no debugging symbols found)...done.
Loaded symbols for /opt/intel/lib/intel64/libintlc.so.5
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_paffinity_linux.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_paffinity_linux.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_carto_auto_detect.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_carto_auto_detect.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_ess_env.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_ess_env.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_rml_oob.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_rml_oob.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_oob_tcp.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_oob_tcp.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_routed_binomial.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_routed_binomial.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_grpcomm_bad.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_grpcomm_bad.so
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_allocator_basic.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_allocator_basic.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_allocator_bucket.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_allocator_bucket.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_rcache_vma.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_rcache_vma.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_mpool_fake.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_mpool_fake.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_mpool_rdma.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_mpool_rdma.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_mpool_sm.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_mpool_sm.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/libmca_common_sm.so.1...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/libmca_common_sm.so.1
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_pml_ob1.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_pml_ob1.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_bml_r2.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_bml_r2.so
Reading symbols from /usr/lib64/librdmacm.so.1...Reading symbols from /usr/lib/debug/usr/lib64/librdmacm.so.1.0.0.debug...done.
done.
Loaded symbols for /usr/lib64/librdmacm.so.1
Reading symbols from /usr/lib64/libibverbs.so.1...Reading symbols from /usr/lib/debug/usr/lib64/libibverbs.so.1.0.0.debug...done.
done.
Loaded symbols for /usr/lib64/libibverbs.so.1
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_btl_openib.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_btl_openib.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_btl_self.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_btl_self.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_btl_sm.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_btl_sm.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_btl_tcp.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_btl_tcp.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_coll_basic.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_coll_basic.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_coll_hierarch.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_coll_hierarch.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_coll_inter.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_coll_inter.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_coll_self.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_coll_self.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_coll_sm.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_coll_sm.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_coll_sync.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_coll_sync.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_coll_tuned.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_coll_tuned.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_osc_pt2pt.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_osc_pt2pt.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_osc_rdma.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_osc_rdma.so
Reading symbols from /usr/lib64/libmlx4-rdmav2.so...Reading symbols from /usr/lib/debug/usr/lib64/libmlx4-rdmav2.so.debug...done.
done.
Loaded symbols for /usr/lib64/libmlx4-rdmav2.so
Reading symbols from /usr/lib64/libmthca-rdmav2.so...Reading symbols from /usr/lib/debug/usr/lib64/libmthca-rdmav2.so.debug...done.
done.
Loaded symbols for /usr/lib64/libmthca-rdmav2.so
Reading symbols from /usr/lib64/libipathverbs-rdmav2.so...Reading symbols from /usr/lib/debug/usr/lib64/libipathverbs-rdmav2.so.debug...done.
done.
Loaded symbols for /usr/lib64/libipathverbs-rdmav2.so
Reading symbols from /usr/lib64/libnes-rdmav2.so...Reading symbols from /usr/lib/debug/usr/lib64/libnes-rdmav2.so.debug...done.
done.
Loaded symbols for /usr/lib64/libnes-rdmav2.so
Reading symbols from /usr/lib64/libcxgb3-rdmav2.so...Reading symbols from /usr/lib/debug/usr/lib64/libcxgb3-rdmav2.so.debug...done.
done.
Loaded symbols for /usr/lib64/libcxgb3-rdmav2.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_pubsub_orte.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_pubsub_orte.so
Reading symbols from /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_dpm_orte.so...(no debugging symbols found)...done.
Loaded symbols for /opt/mpi/openmpi/1.4.3/intel/lib/openmpi/mca_dpm_orte.so
Symbol file not found for /tmp/2563.1.all.q/gfsC0YVGX/module.so
Symbol file not found for /tmp/2563.1.all.q/gfsLB2CGX/module.so
Symbol file not found for /tmp/2563.1.all.q/gfsSNFHGX/module.so
Symbol file not found for /tmp/2563.1.all.q/gfsW7MLGX/module.so
Core was generated by `/opt/gerris/bin/gerris3D ./pswirl.gfs'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002b320e9d00f0 in ?? ()

0 个答案:

没有答案