这适用于MPI,其中2台计算机运行64位cpus,另一台计算机运行32位cpu。所有计算机对于lib
和bin
具有相同的确切位置,并且它们都具有完全相同的bashrc以及用于存储可执行文件的相同文件夹。 SSH连接在64位计算机和32位计算机上均相同。该服务器是一台64位计算机。我在32位计算机上本地编译了可执行文件(显示为([K7ASA:1555])),并且该文件可以运行,但是当我尝试远程并行运行时,收到了此消息。
mpirun -host 10.42.0.163,10.42.0.72,10.42.0.68 ./mpi_quad-1
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "(null)" (-43) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[K7ASA:1555] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[40577,1],2]
Exit code: 1
这是
的输出 mpirun -host 10.42.0.163,10.42.0.72,10.42.0.68 --tag-output uname -a
[1,0]<stdout>:Linux verthex-Lenovo-V570 4.15.0-38-generic #41~16.04.1-Ubuntu SMP Wed Oct 10 20:16:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[1,1]<stdout>:Linux verthex-HP-Pavilion-zv5000-DP299AV 4.15.0-38-generic #41-Ubuntu SMP Wed Oct 10 10:59:38 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[1,2]<stdout>:Linux verthex-K7ASA 4.15.0-38-generic #41-Ubuntu SMP Wed Oct 10 10:58:23 UTC 2018 i686 athlon i686 GNU/Linux
mpirun -host 10.42.0.163,10.42.0.72,10.42.0.68 --tag-output file mpi_quad-1
[1,0]<stdout>:mpi_quad-1: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=a7aa397b9a339ae464201270a065fa7037721016, not stripped
[1,1]<stdout>:mpi_quad-1: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=a7aa397b9a339ae464201270a065fa7037721016, not stripped
[1,2]<stdout>:mpi_quad-1: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=a7aa397b9a339ae464201270a065fa7037721016, not stripped
mpirun -host 10.42.0.163,10.42.0.72,10.42.0.68 --tag-output ldd mpi_quad-1
[1,0]<stdout>: linux-vdso.so.1 => (0x00007ffc091eb000)
[1,0]<stdout>: libmpi.so.40 => /usr/local/lib/libmpi.so.40 (0x00007fbda7934000)
[1,0]<stdout>: libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fbda7717000)
[1,0]<stdout>: libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbda734d000)
[1,0]<stdout>: libopen-rte.so.40 => /usr/local/lib/libopen-rte.so.40 (0x00007fbda7096000)
[1,0]<stdout>: libopen-pal.so.40 => /usr/local/lib/libopen-pal.so.40 (0x00007fbda6d8b000)
[1,0]<stdout>: librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fbda6b83000)
[1,0]<stdout>: libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fbda687a000)
[1,0]<stdout>: /lib64/ld-linux-x86-64.so.2 (0x00007fbda7c2e000)
[1,0]<stdout>: libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fbda6660000)
[1,0]<stdout>: libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fbda645c000)
[1,0]<stdout>: libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007fbda6251000)
[1,0]<stdout>: libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fbda604e000)
[1,1]<stdout>: [1,1]<stdout>:linux-vdso.so.1 (0x00007ffcfcdd0000)
[1,1]<stdout>: [1,1]<stdout>:libmpi.so.40 => /usr/local/lib/libmpi.so.40 (0x00007f59231b5000)
[1,1]<stdout>: [1,1]<stdout>:libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5922f96000)
[1,1]<stdout>: [1,1]<stdout>:libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5922ba5000)
[1,1]<stdout>: [1,1]<stdout>:libopen-rte.so.40 => /usr/local/lib/libopen-rte.so.40 (0x00007f59228f0000)
[1,1]<stdout>: libopen-pal.so.40 => /usr/local/lib/libopen-pal.so.40 (0x00007f59225e1000)
[1,1]<stdout>: librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f59223d9000)
[1,1]<stdout>: [1,1]<stdout>:libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f592203b000)
[1,1]<stdout>: /lib64/ld-linux-x86-64.so.2 (0x00007f59234ca000)
[1,1]<stdout>: libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f5921e1e000)
[1,1]<stdout>: libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5921c1a000)
[1,1]<stdout>: libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f5921a0f000)
[1,1]<stdout>: [1,1]<stdout>:libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f592180c000)
[1,2]<stdout>: [1,2]<stdout>:not a dynamic executable
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[45618,1],2]
Exit code: 1
--------------------------------------------------------------------------