我在Vampir的集群上工作,用于可视化mpi通信。因为集群缺少MPI3实现,所以我在我的主目录中安装了OpenMPI 2.0.0(没有使用其他标志而不是--prefix)(没有Vampir就可以正常工作)。现在我不知道将我的本地MPI3-install与Vampir正确组合以构建我的程序(fetchAndOpTest.f90)。我尝试了以下方法:
vtf90 -vt:fc ~/OpenMPI2/bin/mpif90 -o fetchAndOpTestF90.x fetchAndOpTest.f90
(不知道它是否重要,但这会发出以下警告:/usr/bin/ld: warning: libmpi.so.1, needed by /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../lib/libmpi_f77.so, may conflict with libmpi.so.20
)
使用~/OpenMPI2/bin/mpirun -np 2 fetchAndOpTestF90.x
执行我的程序会导致:
fetchAndOpTestF90.x: error while loading shared libraries: libvt-mpi.so.0: cannot open shared object file: No such file or directory [...]
因此我也试过vtf90 -vt:fc ~/OpenMPI2/bin/mpif90 -L/opt/vampirtrace/5.14.4/lib -o fetchAndOpTestF90.x fetchAndOpTest.f90
,但它没有改变ldd-output。
编辑:按照@Harald的建议编辑LD_LIBRARY_PATH。
> ldd fetchAndOpTestF90.x
linux-vdso.so.1 => (0x00007ffc6ada9000)
libmpi_f77.so.1 => /usr/lib/libmpi_f77.so.1 (0x00007ff8fdf2e000)
libvt-mpi.so.0 => /opt/vampirtrace/5.14.4/lib/libvt-mpi.so.0 (0x00007ff8fdca3000)
libvt-mpi-unify.so.0 => /opt/vampirtrace/5.14.4/lib/libvt-mpi-unify.so.0 (0x00007ff8fda18000)
libotfaux.so.0 => /opt/vampirtrace/5.14.4/lib/libotfaux.so.0 (0x00007ff8fd810000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff8fd50c000)
libopen-trace-format.so.1 => /opt/vampirtrace/5.14.4/lib/libopen-trace-format.so.1 (0x00007ff8fd2c4000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ff8fd0ab000)
libpapi.so.5.3 => /usr/lib/x86_64-linux-gnu/libpapi.so.5.3 (0x00007ff8fce57000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff8fcc53000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007ff8fc939000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff8fc633000)
libmpi_usempi.so.20 => /home/USER/OpenMPI2/lib/libmpi_usempi.so.20 (0x00007ff8fc430000)
libmpi_mpifh.so.20 => /home/USER/OpenMPI2/lib/libmpi_mpifh.so.20 (0x00007ff8fc1df000)
libmpi.so.20 => /home/USER/OpenMPI2/lib/libmpi.so.20 (0x00007ff8fbefb000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff8fbce5000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007ff8fbaa9000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff8fb88b000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff8fb4c6000)
libmpi.so.1 => /usr/lib/libmpi.so.1 (0x00007ff8fb145000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff8fe162000)
libpfm.so.4 => /usr/lib/x86_64-linux-gnu/libpfm.so.4 (0x00007ff8fadff000)
libopen-pal.so.20 => /home/USER/OpenMPI2/lib/libopen-pal.so.20 (0x00007ff8fab09000)
libopen-rte.so.20 => /home/USER/OpenMPI2/lib/libopen-rte.so.20 (0x00007ff8fa887000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007ff8fa684000)
libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007ff8fa43b000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007ff8fa231000)
libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007ff8fa026000)
libpciaccess.so.0 => /usr/lib/x86_64-linux-gnu/libpciaccess.so.0 (0x00007ff8f9e1d000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff8f9c15000)
现在执行抛出错误:mpirun noticed that process rank 0 with PID 0 on node cluster exited on signal 11 (Segmentation fault)
(程序正确,使用本地MPI3安装构建和执行,没有Vampir运行正常)
答案 0 :(得分:3)
您的VampirTrace库是根据其他系统范围的MPI实现编译的,并且依赖于其DSO:
--> libmpi_f77.so.1 => /usr/lib/libmpi_f77.so.1 (0x00007ff8fdf2e000)
libmpi_usempi.so.20 => /home/USER/OpenMPI2/lib/libmpi_usempi.so.20 (0x00007ff8fc430000)
libmpi_mpifh.so.20 => /home/USER/OpenMPI2/lib/libmpi_mpifh.so.20 (0x00007ff8fc1df000)
libmpi.so.20 => /home/USER/OpenMPI2/lib/libmpi.so.20 (0x00007ff8fbefb000)
--> libmpi.so.1 => /usr/lib/libmpi.so.1 (0x00007ff8fb145000)
libopen-pal.so.20 => /home/USER/OpenMPI2/lib/libopen-pal.so.20 (0x00007ff8fab09000)
libopen-rte.so.20 => /home/USER/OpenMPI2/lib/libopen-rte.so.20 (0x00007ff8fa887000)
VampirTrace使用的PMPI_*
符号可能由系统范围的MPI库解析,因此PMPI机制的参数传递失败。由于VampirTrace是一个开源项目(与Vampir不同,它是闭源商业工具),您可以从the official site下载它并使用您自己的Open MPI构建进行编译。但这对你的情况没有帮助,因为VampirTtrace对新的MPI-3 RMA调用一无所知,也不会跟踪它们(它们很可能在跟踪中显示为用户函数)。
正如已经建议的那样,使用Score-P代替。发行版2.0.2支持整个MPI-3.1调用集合。
答案 1 :(得分:2)
为了找到库,问题似乎可以解决不同的替代方案:
-static
标志)。${HOME}/OpenMPI2/lib
(或/opt/vampirtrace/5.14.4/lib
?)(因为您的MPI安装在那里)添加到LD_LIBRARY_PATH环境变量中。-Wl,-rpath -Wl,${HOME}/OpenMPI2/lib
(或/opt/vampirtrace/5.14.4/lib
?) 修改强>
请注意,您指出您安装了vampirtrace(/opt/vampirtrace/5.14.4
),但与OpenMPI 2.0(请参阅this)相比,它太旧了(请参阅that) - 大约有3个多年的差异。这些年来,OpenMPI发生了很大的变化,特别是在2.0版本中。这也可能与您观察到的警告有关 - 即版本上的分歧。此外,这是关于这个问题的坏消息,从上一个网站链接中你会注意到OpenMPI中的vampirtrace嵌入式软件包已被删除。
你最好的选择,恕我直言,你试试vampirtrace的继任者(名为Score-P),它也生成Vampir跟踪文件。由于OpenMPI 2.0是最新版本,您可能需要尝试使用Score-P中的RC。