我注意到在我的MPI程序中,MPI_Finalize()需要很长时间才能完成,大约10/20秒,而程序本身在几毫秒内完成(它几乎立即生成正确的结果)。
OPENMPI手册http://www.open-mpi.org/doc/v1.6/man3/MPI_Finalize.3.php指出MPI_Finalize()应仅检查待处理的通信。我推断,如果某些通信不匹配或很快完成,它应该失败。
MPI_Finalize有什么可能的解释需要花费很多时间才能完成?
更新:多次执行同一程序时似乎出现问题,即MPI_Finalize的第一次执行通常很快,然后降级。 即使对于像这样的非常简单的程序也很明显:
#include <stdio.h>
#include <mpi.h>
int main (int argc,char* argv[])
{
int rank, size;
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
printf( "Hello world from process %d of %d\n", rank, size );
MPI_Finalize();
return 0;
}
此外,问题似乎不受进程数量的影响。 我在双插槽Intel Xeon E5520 @ 2.27GHz上遇到了这个问题。
更新2
[andromeda.di.unipi.it:03918] procdir: /tmp/openmpi-sessions lottarin@andromeda.di.unipi.it_0/18136/0/0
[andromeda.di.unipi.it:03918] jobdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/0
[andromeda.di.unipi.it:03918] top: openmpi-sessions-lottarin@andromeda.di.unipi.it_0
[andromeda.di.unipi.it:03918] tmp: /tmp
[andromeda.di.unipi.it:03918] mpirun: reset PATH: /tmp/OPENMPI/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/lottarin/bin
[andromeda.di.unipi.it:03918] mpirun: reset LD_LIBRARY_PATH: /tmp/OPENMPI/lib
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received add_local_procs
MPIR_being_debugged = 0
MPIR_debug_state = 1
MPIR_partial_attach_ok = 1
MPIR_i_am_starter = 0
MPIR_forward_output = 0
MPIR_proctable_size = 4
MPIR_proctable:
(i, host, exe, pid) = (0, andromeda.di.unipi.it, /home/lottarin/PROVE_MPI/./a.out, 3919)
(i, host, exe, pid) = (1, andromeda.di.unipi.it, /home/lottarin/PROVE_MPI/./a.out, 3920)
(i, host, exe, pid) = (2, andromeda.di.unipi.it, /home/lottarin/PROVE_MPI/./a.out, 3921)
(i, host, exe, pid) = (3, andromeda.di.unipi.it, /home/lottarin/PROVE_MPI/./a.out, 3922)
MPIR_executable_path: NULL
MPIR_server_arguments: NULL
[andromeda.di.unipi.it:03920] procdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1/1
[andromeda.di.unipi.it:03920] jobdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1
[andromeda.di.unipi.it:03920] top: openmpi-sessions-lottarin@andromeda.di.unipi.it_0
[andromeda.di.unipi.it:03920] tmp: /tmp
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync+nidmap from local proc [[18136,1],1]
[andromeda.di.unipi.it:03919] procdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1/0
[andromeda.di.unipi.it:03919] jobdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1
[andromeda.di.unipi.it:03919] top: openmpi-sessions-lottarin@andromeda.di.unipi.it_0
[andromeda.di.unipi.it:03919] tmp: /tmp
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync+nidmap from local proc [[18136,1],0]
[andromeda.di.unipi.it:03920] [[18136,1],1] node[0].name andromeda daemon 0
[andromeda.di.unipi.it:03919] [[18136,1],0] node[0].name andromeda daemon 0
[andromeda.di.unipi.it:03922] procdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1/3
[andromeda.di.unipi.it:03922] jobdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1
[andromeda.di.unipi.it:03922] top: openmpi-sessions-lottarin@andromeda.di.unipi.it_0
[andromeda.di.unipi.it:03922] tmp: /tmp
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync+nidmap from local proc [[18136,1],3]
[andromeda.di.unipi.it:03921] procdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1/2
[andromeda.di.unipi.it:03921] jobdir: /tmp/openmpi-sessions-lottarin@andromeda.di.unipi.it_0/18136/1
[andromeda.di.unipi.it:03921] top: openmpi-sessions-lottarin@andromeda.di.unipi.it_0
[andromeda.di.unipi.it:03921] tmp: /tmp
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync+nidmap from local proc [[18136,1],2]
[andromeda.di.unipi.it:03922] [[18136,1],3] node[0].name andromeda daemon 0
[andromeda.di.unipi.it:03921] [[18136,1],2] node[0].name andromeda daemon 0
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received message_local_procs
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received message_local_procs
Hello world from process 1 of 4
Hello world from process 3 of 4
Hello world from process 0 of 4
Hello world from process 2 of 4
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received message_local_procs
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync from local proc [[18136,1],1]
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync from local proc [[18136,1],3]
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync from local proc [[18136,1],0]
[andromeda.di.unipi.it:03920] sess_dir_finalize: proc session dir not empty - leaving
[andromeda.di.unipi.it:03922] sess_dir_finalize: proc session dir not empty - leaving
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_recv: received sync from local proc [[18136,1],2]
[andromeda.di.unipi.it:03919] sess_dir_finalize: proc session dir not empty - leaving
[andromeda.di.unipi.it:03921] sess_dir_finalize: proc session dir not empty - leaving
**LAGS HERE after having received sync from all processes**
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received waitpid_fired cmd
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received iof_complete cmd
[andromeda.di.unipi.it:03918] sess_dir_finalize: proc session dir not empty - leaving
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received iof_complete cmd
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received waitpid_fired cmd
[andromeda.di.unipi.it:03918] sess_dir_finalize: proc session dir not empty - leaving
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received waitpid_fired cmd
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received iof_complete cmd
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received iof_complete cmd
[andromeda.di.unipi.it:03918] sess_dir_finalize: proc session dir not empty - leaving
[andromeda.di.unipi.it:03918] [[18136,0],0] orted_cmd: received waitpid_fired cmd
[andromeda.di.unipi.it:03918] sess_dir_finalize: proc session dir not empty - leaving
[andromeda.di.unipi.it:03918] sess_dir_finalize: job session dir not empty - leaving
[andromeda.di.unipi.it:03918] [[18136,0],0] Releasing job data for [18136,1]
[andromeda.di.unipi.it:03918] sess_dir_finalize: job session dir not empty - leaving
[andromeda.di.unipi.it:03918] [[18136,0],0] Releasing job data for [18136,0]
[andromeda.di.unipi.it:03918] sess_dir_finalize: proc session dir not empty - leaving
orterun: exiting with status 0