我使用MPI和C进行编程,我使用根等级从文件中读取数据,然后将其分配给剩余的等级。我的MPI_Scatter工作正常,我打印出值以确保它们正确(并且它们是正确的)。我的问题是,在分配结构之后,当我尝试从根级别以外的其他级别访问它时,我会出错。
pr_graph * graph = malloc(sizeof(*graph));
....
MPI_Scatter(verticesCountArray, 1, MPI_INT, &(graph->nvtxs), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
MPI_Scatter(edgesCountArray, 1, MPI_INT, &(graph->nedges), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
for(int rank = 0; rank<numProcesses; rank++){
if (rank == myrank){
fprintf(stderr, "%d %d \n",graph->nvtxs, graph->nedges );
graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
// graph->xadj[graph->nvtxs] = graph->nedges;
}
MPI_Barrier(MPI_COMM_WORLD);
}
我的输出是:
2 4
2 4
2 4
哪个是对的。但是当我取消注释评论行时,我得到:
2 4
2 4
[phi01:07170] *** Process received signal ***
[phi01:07170] Signal: Segmentation fault (11)
[phi01:07170] Signal code: (128)
[phi01:07170] Failing at address: (nil)
[phi01:07170] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f5740503390]
[phi01:07170] [ 1] ./pagerank[0x401188]
[phi01:07170] [ 2] ./pagerank[0x400c73]
[phi01:07170] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f5740149830]
[phi01:07170] [ 4] ./pagerank[0x400ce9]
[phi01:07170] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 7170 on node phi01 exited on signal 11 (Segmentation fault).
这意味着只有0级才能访问它分配的结构。任何人都可以指出我为什么?谢谢!
修改
插入两个recvbuffers的任何值都不会出现段错误并打印出正确的值。似乎错误源于使用MPI_Scatter()。
graph->nvtxs = 2;
graph->nedges = 4;
for(int rank = 0; rank<numProcesses; rank++){
if (rank == myrank){
fprintf(stderr, "%d %d \n",graph->nvtxs, graph->nedges );
graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
graph->xadj[graph->nvtxs] = graph->nedges;
}
MPI_Barrier(MPI_COMM_WORLD);
}
答案 0 :(得分:0)
我找到了问题的解决方案。我先发布它,然后尝试理解它的工作原理。
pr_int * nvtxs = malloc(sizeof(pr_int));
pr_int * nedges = malloc(sizeof(pr_int));
MPI_Scatter(verticesCountArray, 1, MPI_INT, &(nvtxs), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
MPI_Scatter(edgesCountArray, 1, MPI_INT, &(nedges), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
graph->nvtxs = nvtxs;
graph->nedges = nedges;
for(int rank = 0; rank<numProcesses; rank++){
if (rank == myrank){
fprintf(stderr, "%d %d \n",graph->nvtxs, graph->nedges );
graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
graph->xadj[graph->nvtxs] = graph->nedges;
}
MPI_Barrier(MPI_COMM_WORLD);
}
我认为我没有使用实际缓冲区(指针)来接收,只是常规变量。在调用malloc期间,它们可能已被转换为指针(地址值),这就是为什么结构的大小可能已经疯狂。我仍然不确定为什么我能够打印这些值,或者甚至不知道排名0是如何工作的。任何想法将不胜感激!谢谢!