MPI_Comm_rank()中的MPI分段错误

时间:2014-03-04 06:08:29

标签: segmentation-fault mpi openmpi

我是MPI的初学者,这段代码似乎会产生分段错误。

int luDecomposeP(double *LU, int n)
{
    int i, j, k;
    int sendcount, recvcount, remaining, rank, numProcs, status;
    double *row, *rowFinal, *start, factor;

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &numProcs);

    MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);

    row = (double *)malloc(n*sizeof(double));
    rowFinal = (double *)malloc(n*n*sizeof(double));

    for(i=0; i<n-1; i++)
    {
        if(rank == 0)
        {
            status = pivot(LU,i,n);

            for(j=0; j<n; j++)
            row[j] = LU[n*i+j];
        }

        MPI_Bcast(&status, 1, MPI_INT, 0, MPI_COMM_WORLD);

        if(status == -1)
            return -1;

        MPI_Bcast(row, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);

    sendcount = (n-i-1)/numProcs;  
    recvcount = (n-i-1)/numProcs;
    remaining = (n-i-1)%numProcs;

    if(rank == 0)
        start = LU + n*(i+1);
    else
        start = NULL;

    MPI_Scatter(start, sendcount*n, MPI_DOUBLE, rowFinal, recvcount*n, MPI_DOUBLE, 0, MPI_COMM_WORLD);

    for(j=0; j<recvcount; j++)
    {
        factor = rowFinal[n*j+i]/row[i];

        for(k=i+1; k<n; k++)
            rowFinal[n*j+k] -= row[k]*factor;

        rowFinal[n*j+i] = factor;
    }

    MPI_Gather(rowFinal, recvcount*n, MPI_DOUBLE, start, sendcount*n, MPI_DOUBLE, 0, MPI_COMM_WORLD);

if(rank == 0)
    {
        int ctr = 0;

        while(ctr<remaining)
        {
            int index = sendcount*numProcs + ctr + i + 1;

            factor = LU[n*index+i]/row[i];

            for(k=i+1; k<n; k++)
                LU[n*index+k] -= row[k]*factor;

            LU[n*index+i] = factor;

            ctr++;
        }
    }
}   
free(row);
free(rowFinal);

return 0;
}

此代码导致分段错误。我读了很多答案并试图解决它 但那并没有发生。 我读到了解除引用NULL指针的问题,我使用一个名为start 的指针修复了它。但是细分错误仍然会出现。

错误:

  

[sheshnag:32334] *处理收到的信号*

     

[sheshnag:32334]信号:分段错误(11)

     

[sheshnag:32334]信号代码:未映射的地址(1)

     

[sheshnag:32334]地址失败:0x44000098

     

[sheshnag:32334] [0] /lib/libpthread.so.0(+0xf8f0)[0x2b082eafe8f0]

     

[sheshnag:32334] [1] /usr/lib/openmpi/lib/libmpi.so.0(MPI_Comm_rank+0x5e)[0x2b082d5ff6ee]

     

[sheshnag:32334] [2] ./libluDecompose.so(luDecomposeP+0x2f)[0x2b082d17ea2f]

     

[sheshnag:32334] [3] _tmp / bench.mpi.exe(main + 0x2e7)[0x40b61d]

     

[sheshnag:32334] [4] /lib/libc.so.6(__libc_start_main+0xfd)[0x2b082ed2ac4d]

     

[sheshnag:32334] [5] _tmp / bench.mpi.exe()[0x40ac49]

1 个答案:

答案 0 :(得分:1)

从您报告的堆栈跟踪中,似乎在MPI_Comm_rank()的调用中发生了分段错误。

我看到两个可能的问题:

  • MPI_Init()失踪。通常,MPI明确报告它已丢失,但您的MPI实施可能导致崩溃?在任何其他MPI调用之前必须调用MPI_Init()(并且在退出之前必须调用MPI_Finalize())。

  • 破坏了MPI安装。简单的MPI“hello world”程序是否正常工作?

哦,是的......第三种选择:

  • 调用发生在受损的堆栈中(从调用luDecomposeP()之前的指令):MPI_Comm_rank()是写入堆栈变量的第一个操作。