我正在尝试运行使用英特尔编译器编译的程序。该程序具有openMP和MPI代码。 MPI代码是该程序的新增功能。
OpenMPI(使用英特尔编译)错误
[ida3c03:22018] 63 more processes have sent help message help-mpi-btl-openib.txt / reg mem limit low
[ida3c03:22018] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
done!
[ida3c04:03329] *** Process received signal ***
[ida3c04:03329] Signal: Segmentation fault (11)
[ida3c04:03329] Signal code: Address not mapped (1)
[ida3c04:03329] Failing at address: 0x10
错误1
Could not find **ibv_destroy_qp in list of messages
Could not find **ibv_destroy_qp in list of messages
Could not find **ibv_destroy_qp in list of messages
Could not find **ibv_destroy_qp in list of messages
Could not find **ibv_destroy_qp in list of messages
Could not find **vc_gen2_qp_finalize in list of messages
Could not find **ibv_destroy_qp in list of messages
Could not find **vc_gen2_qp_finalize in list of messages
Could not find **vc_gen2_qp_finalize in list of messages
Could not find **vc_gen2_qp_finalize in list of messages
Could not find **vc_gen2_qp_finalize in list of messages
Could not find **vc_gen2_qp_finalize in list of messages
Fatal error in MPI_Finalize: Internal MPI error!, error stack:
MPI_Finalize(311).................: MPI_Finalize failed
MPI_Finalize(229).................:
MPID_Finalize(140)................:
MPIDI_CH3_Finalize(24)............:
MPID_nem_finalize(63).............:
MPID_nem_gen2_module_finalize(520):(unknown)(): Internal MPI error!
错误2
recv desc error, 128, 0x9d0240
[mpiexec@ida3c03] control_cb (./pm/pmiserv/pmiserv_cb.c:717): assert (!closed) failed
[mpiexec@ida3c03] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@ida3c03] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@ida3c03] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion
这个项目很庞大,很难向你展示所有的MPI代码,但我从其中一个主要文件中搜出了MPI调用,我认为这个错误发生了,以防万一有用。我用“vector _#”填充了实际的变量名。
1172: MPI_Init(&argc,&argv);
1173: MPI_Comm_size(MPI_COMM_WORLD, &numranks);
1174: MPI_Comm_rank(MPI_COMM_WORLD,&rank);
1315: MPI_Barrier(MPI_COMM_WORLD);
1593: MPI_Allgather(&long_1,1,MPI_LONG,vector_1,1,MPI_LONG,MPI_COMM_WORLD);
1594: MPI_Allgather(&long_2, 1,MPI_LONG,vector_2, 1,MPI_LONG,MPI_COMM_WORLD);
1624: MPI_Barrier(MPI_COMM_WORLD);
1655: MPI_Bcast(buffer,long*[count],MPI_CHAR,count,MPI_COMM_WORLD);
1661: MPI_Allgatherv(dnas_chars_send, long_2, MPI_CHAR, char*,int*, long*, MPI_CHAR,MPI_COMM_WORLD);
1740: MPI_Barrier(MPI_COMM_WORLD);
2013: MPI_Barrier(MPI_COMM_WORLD);
2064: MPI_Allgather(&int,1,MPI_INT,int*,1,MPI_INT,MPI_COMM_WORLD);
2066: MPI_Allgather(&int,1,MPI_INT,int*,1,MPI_INT,MPI_COMM_WORLD);
2068: MPI_Allgather(&int,1,MPI_INT,int*,1,MPI_INT,MPI_COMM_WORLD);
2106: MPI_Allgatherv(int*, int,MPI_INT,int* ,int*, int*, MPI_INT,MPI_COMM_WORLD);
2107: MPI_Allgatherv(int*, int,MPI_INT,int* ,int*, int*, MPI_INT,MPI_COMM_WORLD);
2108: MPI_Allgatherv(int*,int,MPI_INT,int*,int*,int*, MPI_INT,MPI_COMM_WORLD);
2136: MPI_Allgatherv(int*,int,MPI_INT,int*,int*,int*, MPI_INT,MPI_COMM_WORLD);
2137: MPI_Allgatherv(int*,int,MPI_INT,int*,int*,int*, MPI_INT,MPI_COMM_WORLD);
2164: MPI_Barrier(MPI_COMM_WORLD);
2519: MPI_Finalize();
但是我不明白为什么代码在gcc中没问题,然后为icpc遇到这个错误。代码在群集上运行,我无权访问MPI进程正在运行的后端节点。