MPI零星错误

时间:2014-11-07 03:40:14

标签: c mpi

我正在运行看起来像这样的MPI代码(请参阅下面的说明):

#include <mpi.h>
#include <sys/time.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

...

int main(int argc, char * argv[])
{
    int i,j,local_N,num_procs = 0;
    int N = 16; // width and height of matrix

    int rank;
    float ** A;
    float ** local_A;

    // MPI stuff
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &num_procs); // "size" is number of processes
    MPI_Status status;

    // allocate and initialize A
    if (rank == 0) {
        A = allocate_matrix(N, N);
        initialize_matrix(A, N, N);
    }

    // allocate local matrices
    local_N = N / num_procs;
    local_A = allocate_matrix(local_N, N);

    //send/rcv pieces of matrix
    for (i = 1; i < num_procs; i++) {
        if (rank == 0) {
            MPI_Send(A[i * local_N], N, MPI_FLOAT, i, 123, MPI_COMM_WORLD);
        }

        if (rank == i) {
            MPI_Recv(local_A[0], N, MPI_FLOAT, 0, 123, MPI_COMM_WORLD, &status);
        }
    }

    if (rank == 0) free_matrix(A, N, N);
    free_matrix(local_A, local_N, N);
    MPI_Finalize();
    return 0;
}

以下是辅助方法:

float ** allocate_matrix(int rows, int cols) {
    int i = 0;
    float ** matrix = (float **) malloc(rows * sizeof(float *));

    for (i = 0; i < cols; i++) {
        matrix[i] = (float *) malloc(cols * sizeof(float));
    }

    return matrix;
}

void initialize_matrix(float ** matrix, int rows, int cols) {
    int i, j = 0;

    for (i = 0; i < rows; i++) {
        for (j = 0; j < cols; j++) {
            matrix[i][j] = ((float)(rand()%10000))/1000.0;
        }
    }
}

void free_matrix(float ** matrix, int rows, int cols) {
    int i, j = 0;
    for (j = 0; j < rows; j++) free(matrix[j]);
}

基本上,我将一个16乘16的矩阵分成(进程数)块,并从每个块向每个进程发送一行。在发送和接收之前和之后打印行表明它们正在正确发送。

我使用mpirun -n 4 ./<name of executable>一遍又一遍地运行此代码,每次都会发生不同的事情。之一:

  • 正常退出

  • mpirun noticed that process rank ... exited on signal 6

  • mpirun noticed that process rank ... exited on signal 10

  • mpirun noticed that process rank ... exited on signal 11

我看到了信号6,并认为“双自由()!”,但我已经尝试删除释放方法,信号6仍然存在,以及其他错误。知道为什么会出现这些错误,为什么每次都会出现不同的错误?调试器跟踪使问题看起来像MPI_Finalize()

1 个答案:

答案 0 :(得分:0)

这不是MPI问题。错误来自分配功能。您正在迭代列数而不是行数。

for (i = 0; i < cols; i++) {

应该是

for (i = 0; i < rows; i++) {

匹配上面的malloc。你正在使用mallocing 4行,但迭代超过16列。