MPI_Scatter()后跟malloc导致段错误

时间:2017-04-04 17:48:36

标签: c malloc mpi

我使用MPI和C进行编程,我使用根等级从文件中读取数据,然后将其分配给剩余的等级。我的MPI_Scatter工作正常,我打印出值以确保它们正确(并且它们是正确的)。我的问题是,在分配结构之后,当我尝试从根级别以外的其他级别访问它时,我会出错。

    pr_graph * graph = malloc(sizeof(*graph));
    ....

    MPI_Scatter(verticesCountArray, 1, MPI_INT, &(graph->nvtxs), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
    MPI_Scatter(edgesCountArray, 1, MPI_INT, &(graph->nedges), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);

    for(int rank = 0; rank<numProcesses; rank++){
      if (rank == myrank){
        fprintf(stderr, "%d %d \n",graph->nvtxs, graph->nedges );
        graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
        graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
        // graph->xadj[graph->nvtxs] = graph->nedges;

      }
      MPI_Barrier(MPI_COMM_WORLD);
    }

我的输出是:

    2 4 
    2 4 
    2 4 

哪个是对的。但是当我取消注释评论行时,我得到:

    2 4 
    2 4 
    [phi01:07170] *** Process received signal ***
    [phi01:07170] Signal: Segmentation fault (11)
    [phi01:07170] Signal code:  (128)
    [phi01:07170] Failing at address: (nil)
    [phi01:07170] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f5740503390]
    [phi01:07170] [ 1] ./pagerank[0x401188]
    [phi01:07170] [ 2] ./pagerank[0x400c73]
    [phi01:07170] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f5740149830]
    [phi01:07170] [ 4] ./pagerank[0x400ce9]
    [phi01:07170] *** End of error message ***
    --------------------------------------------------------------------------
    mpirun noticed that process rank 1 with PID 7170 on node phi01 exited on signal 11 (Segmentation fault).

这意味着只有0级才能访问它分配的结构。任何人都可以指出我为什么?谢谢!

修改

插入两个recvbuffers的任何值都不会出现段错误并打印出正确的值。似乎错误源于使用MPI_Scatter()。

    graph->nvtxs = 2;
    graph->nedges = 4;
    for(int rank = 0; rank<numProcesses; rank++){
      if (rank == myrank){
        fprintf(stderr, "%d %d \n",graph->nvtxs, graph->nedges );
        graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
        graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
        graph->xadj[graph->nvtxs] = graph->nedges;

      }
      MPI_Barrier(MPI_COMM_WORLD);
    }

1 个答案:

答案 0 :(得分:0)

我找到了问题的解决方案。我先发布它,然后尝试理解它的工作原理。

    pr_int * nvtxs = malloc(sizeof(pr_int));
    pr_int * nedges = malloc(sizeof(pr_int));

    MPI_Scatter(verticesCountArray, 1, MPI_INT, &(nvtxs), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
    MPI_Scatter(edgesCountArray, 1, MPI_INT, &(nedges), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);

    graph->nvtxs = nvtxs;
    graph->nedges = nedges;
    for(int rank = 0; rank<numProcesses; rank++){
      if (rank == myrank){
        fprintf(stderr, "%d %d \n",graph->nvtxs, graph->nedges );
        graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
        graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
        graph->xadj[graph->nvtxs] = graph->nedges;

      }
      MPI_Barrier(MPI_COMM_WORLD);
    }

我认为我没有使用实际缓冲区(指针)来接收,只是常规变量。在调用malloc期间,它们可能已被转换为指针(地址值),这就是为什么结构的大小可能已经疯狂。我仍然不确定为什么我能够打印这些值,或者甚至不知道排名0是如何工作的。任何想法将不胜感激!谢谢!