我正在研究一个需要使用(分布式)图形通信器在节点之间进行通信的MPI应用程序。在某些情况下,最简单的事情是节点与自己进行通信。在测试代码时我遇到了段错误的问题,我无法解释。这是一个重现问题的最小例子(在OpenMPI 1.8.1中):
#include <iostream>
#include <vector>
#include <mpi.h>
int main(void) {
MPI_Init(NULL, NULL);
MPI_Comm mpi_neighbor_comm;
int v[2] = {0, 0}, w[2] = {0, 0};
MPI_Dist_graph_create_adjacent(MPI_COMM_WORLD,
2, v, (int *)MPI_UNWEIGHTED,
2, w, (int *)MPI_UNWEIGHTED,
MPI_INFO_NULL, 0, &mpi_neighbor_comm);
int major, minor;
MPI_Get_version(&major, &minor);
std::cerr << "Version " << major << '.' << minor << std::endl;
int indegree, outdegree, weighted;
MPI_Dist_graph_neighbors_count(mpi_neighbor_comm, &indegree, &outdegree, &weighted);
std::cerr << "Degrees: " << indegree << ", " << outdegree << std::endl;
std::vector<int> in(indegree, 0), out(outdegree, 1);
MPI_Neighbor_alltoall(&out[0], 1, MPI_INT,
&in[0], 1, MPI_INT, mpi_neighbor_comm);
std::cerr << in[0] << std::endl;
MPI_Comm_free(&mpi_neighbor_comm);
MPI_Finalize();
return 0;
}
当使用1个进程运行时,这会给出以下输出:
$ mpiexec -n 1 ./mpi_test
Version 3.0
Degrees: 2, 2
1
[LMC-038262:51444] *** Process received signal ***
[LMC-038262:51444] Signal: Segmentation fault: 11 (11)
[LMC-038262:51444] Signal code: Address not mapped (1)
[LMC-038262:51444] Failing at address: 0x0
[LMC-038262:51444] *** End of error message ***
因为价值&#34; 1&#34;传送到in
,似乎通信工作 - 但程序然后段错误。怎么了?如果我注释掉MPI_Neighbor_alltoall
的号码,则段错误就会消失。