MPI程序运行正常,但有时会抛出Segfault11,有时会抛出aborttrap6

时间:2015-03-08 11:36:03

标签: c matrix segmentation-fault mpi matrix-multiplication

HY

我编写了一个MPI程序,它在网格中划分矩阵,然后在CPU之间分散网格。这是一个矩阵矩阵乘法。我的程序运行正常并输出正确的结果,至少有时候。

有时我在开始时几乎正好收到Abort Trap 6错误(在代码中标记),有时我在一个循环中得到一个Segmentation fault 11,我将矩阵从一个主要顺序重新排列到一个顺序允许我分散网格(也在代码中标记)。我也遇到了一些BusError10。并且故障大多数时间发生在我标记的代码中,但有时也发生在其他地方。

我真的很绝望,因为它有时会起作用,当它抛出一个错误时,它甚至不是同一个而不是代码中的同一点,这是我真正无法得到的。

我还认为只有当我一次又一次地运行程序多次运行时,错误才更有可能发生。

你看到我的错误吗?

这是代码:(这是很多,但我用长行标记了错误部分)

int main(int argc, char **argv) {

    //Initializing communication....
    MPI_Init(&argc, &argv);

    int size = atoi(argv[1]);
    int delta = 10;
    int world_rank;
    int world_size;
    int root = 0;
    // MPI_Status mystatus;

    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Calculate sqrt of world size
    int root_of_worldsize = sqrt((double)world_size);
    if (world_rank == root) {
        printf("The square-root of the worldsize is %d\n", root_of_worldsize);
    }

    // Setup for initializing groups
    int row_rank_a, column_rank_b;
    int **rowranks = malloc(root_of_worldsize*sizeof(int*));
    int **columnranks = malloc(root_of_worldsize*sizeof(int*));

    for (int i = 0; i < root_of_worldsize; i++) {
        rowranks[i] = malloc(root_of_worldsize*sizeof(int));
        columnranks[i] = malloc(root_of_worldsize*sizeof(int));
        for (int j = 0; j < root_of_worldsize ; j++) {
            rowranks[i][j] = (i*root_of_worldsize + j);
            columnranks[i][j] = (j*root_of_worldsize + i);
        }
    }

    //printing rank array
    if (world_rank == root) {
        printf("Colum ranks: ");
        printf("[");
        for (int i = 0; i < root_of_worldsize; i++) {
            printf("[");
            for (int j = 0; j < root_of_worldsize; j++) {
                printf("%d, ", columnranks[i][j]);
            }
            printf("]");
        }
        printf("]\n");
    }
    if (world_rank == root) {
        printf("Row ranks: ");
        printf("[");
        for (int i = 0; i < root_of_worldsize; i++) {
            printf("[");
            for (int j = 0; j < root_of_worldsize; j++) {
                printf("%d, ", rowranks[i][j]);
            }
            printf("]");
        }
        printf("]\n");
    }

    MPI_Group world_group, rows_groupa, columns_groupb;
    MPI_Comm rowa_comm, columb_comm;


    //Get world group handle...
    MPI_Comm_group(MPI_COMM_WORLD, &world_group);

    //check compatibility of size and number of processors
    assert(size % world_size == 0);

    // Create groups
    for (int i = 0; i < root_of_worldsize; i++) {
        if (i*root_of_worldsize <= world_rank && world_rank < (i+1)*root_of_worldsize) {
            //printf("Rank %d; I am getting assigned to the %d row group.\n", world_rank, i+1);
            MPI_Group_incl(world_group, root_of_worldsize, rowranks[i], &rows_groupa);
        }
        if (world_rank % root_of_worldsize == i) {
            //printf("Rank %d; I am getting assigned to the %d column group.\n", world_rank, i+1);
            MPI_Group_incl(world_group, root_of_worldsize, columnranks[i], &columns_groupb);
        }
        if (world_rank == root) {
            printf("\n");
        }
    }
    // Create new communicators
    MPI_Comm_create(MPI_COMM_WORLD, rows_groupa, &rowa_comm);
    MPI_Comm_create(MPI_COMM_WORLD, columns_groupb, &columb_comm);

    // Get respective group ranks
    MPI_Group_rank(rows_groupa, &row_rank_a);
    MPI_Group_rank(columns_groupb, &column_rank_b);

    printf("worldrank = %d; rowrank = %d; columnrank = %d\n", world_rank, row_rank_a, column_rank_b);

    double *matrixA;
    double *matrixB;

    int chunk_size = size / root_of_worldsize;
    if (world_rank == root) {
    printf("Chunk size: %d\n",chunk_size);
    printf("Root of worldsize: %d\n", root_of_worldsize);
    }
    MPI_Barrier(MPI_COMM_WORLD);
    if (world_rank == root) {
        // Create two matrices
        printf("Creating matrices...\n");
        double *matrixA_i = malloc(size*size*sizeof(double));
        double *matrixB_i = malloc(size*size*sizeof(double));
        double **matrixA_2d = malloc(root_of_worldsize*sizeof(double*));
        for (int i = 0; i < size; i++) {
            matrixA_2d[i] = malloc(chunk_size*chunk_size*sizeof(double));
        }
        double **matrixB_2d = malloc(root_of_worldsize*sizeof(double*));
        for (int i = 0; i < size; i++) {
            matrixB_2d[i] = malloc(chunk_size*chunk_size*sizeof(double));
        }

        srand(1234);
        for (int i = 0; i < size; i++) {
            for (int j = 0; j < size; j++) {
                matrixA_i[i*size + j] = rand() % delta + 1;
            }
        }

        srand(2345);
        for (int i = 0; i < size; i++) {
            for (int j = 0; j < size; j++) {
                matrixB_i[i*size + j] = rand() % delta + 1;
            }
        }



-----------------Abort trap 6 is happening around here or also at the every end of programm------------------------------------------------------



        printf("Created matrices.\n");
        printf("Matrix B:\n");
        print_contiguous_matrix_array(matrixB_i, size);
        printf("Matrix A:\n");
        print_contiguous_matrix_array(matrixA_i, size);



-----------------In this for loop is the Seg11 fault------------------------------------------------------

        // Rearrange the matrix to a "major-row-grid"-matrix
        printf("Rearranging matrices for grid scattering\n");
        int k = 0;
        int j = 0;
        int l = 0;
        for (int i = 0; i < (size*size); i++) {
            if (i == 0) {
                //Insert:
                //printf("Counters: k->%d, l->%d, j->%d\n",k,l,j);
                matrixA_2d[k+root_of_worldsize*j][(i%chunk_size) + l*chunk_size] = matrixA_i[i];
                //printf("Writing on: [%d][%d]\n", k+root_of_worldsize*j, i - ((j*chunk_size*chunk_size*root_of_worldsize)+(l*chunk_size*root_of_worldsize)+(k*chunk_size)));
                matrixB_2d[k+root_of_worldsize*j][(i%chunk_size) + l*chunk_size] = matrixB_i[i];
            } else {
                if (i % chunk_size == 0) {
                    k++;
                    if (k > (root_of_worldsize-1)) {
                        k = 0;
                    }
                    //printf("Raised k, k->%d\n", k);
                }
                // Strip counter:
                if (i % (chunk_size*chunk_size*root_of_worldsize) == 0) {
                    j++;
                    //printf("Raised j, j->%d\n", j);
                }
                // line counter:
                if (i % (chunk_size*root_of_worldsize) == 0) {
                    l++;
                    if (l > (chunk_size-1)) {
                        l = 0;
                    }
                    //printf("Raised l, l->%d\n", l);
                }
                //Insert:
                printf("Counters: k->%d, l->%d, j->%d;  i->%d\n",k,l,j,i);
                matrixA_2d[k+root_of_worldsize*j][(i%chunk_size) + l*chunk_size] = matrixA_i[i];
                printf("Writing on: [%d][%d]\n", k+root_of_worldsize*j, (i%chunk_size) + l*chunk_size);
                matrixB_2d[k+root_of_worldsize*j][(i%chunk_size) + l*chunk_size] = matrixB_i[i];
            }
        }
        free(matrixA_i);
        free(matrixB_i);
        // 2d to 1d array
        //printf("2d A: ");
        //print_matrix(matrixA_2d, size, size);
        //printf("2d B: ");
        //print_matrix(matrixB_2d, size, size);
        //Two to one dimensional
        printf("converting from to to one dimensional\n");
        int counter = 0;
        matrixB = malloc(size*size*sizeof(double));
        matrixA = malloc(size*size*sizeof(double));
        for (int i = 0; i < world_size; i++) {
            for (int j = 0; j < chunk_size; j++) {
                for (int k = 0; k < chunk_size; k++) {
                    matrixA[counter] = matrixA_2d[i][j*chunk_size + k];
                    matrixB[counter] = matrixB_2d[i][j*chunk_size + k];
                    counter++;
                }
            }
        }

        //free 2d
        for (int q = 0; q < root_of_worldsize; q++) {
            free(matrixA_2d[q]);
            free(matrixB_2d[q]);
        }
        free(matrixB_2d);
        free(matrixA_2d);
        //printf("Rearranged B ");
        //print_contiguous_matrix_array(matrixB, size);
        //printf("Rearranged A ");
        //print_contiguous_matrix_array(matrixA, size);
    }

    MPI_Barrier(MPI_COMM_WORLD);

    //Scatter....
    double *matrixA_chunk = malloc(chunk_size*chunk_size*sizeof(double));
    double *matrixB_chunk = malloc(chunk_size*chunk_size*sizeof(double));

    double *matrixA_tmp_chunk = malloc(chunk_size*chunk_size*sizeof(double));
    double *matrixB_tmp_chunk = malloc(chunk_size*chunk_size*sizeof(double));

    double *result_chunk = calloc(chunk_size*chunk_size, sizeof(double));

    MPI_Scatter(matrixA, chunk_size*chunk_size, MPI_DOUBLE, matrixA_chunk, chunk_size*chunk_size, MPI_DOUBLE, root, MPI_COMM_WORLD);
    MPI_Scatter(matrixB, chunk_size*chunk_size, MPI_DOUBLE, matrixB_chunk, chunk_size*chunk_size, MPI_DOUBLE, root, MPI_COMM_WORLD);

    for (int z = 0; z < root_of_worldsize; z++) {
        if (row_rank_a == z) {
            matrixA_tmp_chunk = matrixA_chunk;
        }
        MPI_Bcast(matrixA_tmp_chunk, chunk_size*chunk_size, MPI_DOUBLE, z, rowa_comm);
        /*if (world_rank == 0) {
            printf("temporary A: ");
            print_contiguous_matrix_array(matrixA_tmp_chunk, chunk_size);
        }*/
        if (column_rank_b == z) {
            matrixB_tmp_chunk = matrixB_chunk;
            MPI_Bcast(matrixB_tmp_chunk, chunk_size*chunk_size, MPI_DOUBLE, z, columb_comm);
        }
        MPI_Bcast(matrixB_tmp_chunk, chunk_size*chunk_size, MPI_DOUBLE, z, columb_comm);
        printf("Iteration: %d; Rank %d; row_rank %d; temporary A matrix: %f, %f, %f, %f\n", z, world_rank, row_rank_a, matrixA_tmp_chunk[0], matrixA_tmp_chunk[1], matrixA_tmp_chunk[2], matrixA_tmp_chunk[3]);
        /*if (world_rank == 0) {
            printf("temporary B: ");
            print_contiguous_matrix_array(matrixB_tmp_chunk, chunk_size);
        }*/

        //calculate
        for (int i = 0; i < chunk_size; i++) {
            for (int j = 0; j < chunk_size; j++) {
                for (int k = 0; k < chunk_size; k++) {
                    result_chunk[j*chunk_size + i] += (matrixA_tmp_chunk[j*chunk_size + k] * matrixB_tmp_chunk[k*chunk_size + i]);
                }
            }
        }
        MPI_Barrier(MPI_COMM_WORLD);
    }

    double *final_result;
    double *contiguous_final_result = NULL;
    if (world_rank == root) {
        final_result = malloc(size*size*sizeof(double));
        contiguous_final_result = malloc(size*size*sizeof(double));
    }

    MPI_Gather(result_chunk, chunk_size*chunk_size, MPI_DOUBLE, final_result, chunk_size*chunk_size, MPI_DOUBLE, root, MPI_COMM_WORLD);
    if (world_rank == root) {
        printf("final result major grid: ");
        print_contiguous_matrix_array(final_result, size);
    }
    // Rearrange gridded matrix to row major matrix
    if (world_rank == root) {
        int l2 = 0;
        int k2 = 0;
        int s2 = 0;
        for (int i = 0; i < (size*size); i++) {
            if (i == 0) {
                contiguous_final_result[(i%chunk_size) + l2*size + s2*size*chunk_size + k2*chunk_size] = final_result[i];
                printf("Access values: i->%d; l->%d; s->%d; k->%d; total->%d\n", i, l2, s2, k2, (i%chunk_size) + l2*size + s2*size*chunk_size + k2*chunk_size);

            }
            else {
                if (i % chunk_size == 0) {
                    l2++;
                    if (l2 > (chunk_size-1)) {
                        l2 = 0;
                    }
                }
                if (i % (chunk_size*chunk_size*root_of_worldsize) == 0) {
                    s2++;
                }
                if (i % (chunk_size*chunk_size) == 0) {
                    k2++;
                    if (k2 > (root_of_worldsize-1)) {
                        k2 = 0;
                    }
                }
                contiguous_final_result[(i%chunk_size) + l2*size + s2*size*chunk_size + k2*chunk_size] = final_result[i];
                printf("Access values: i->%d; l->%d; s->%d; k->%d; total->%d\n", i, l2, s2, k2, (i%chunk_size) + l2*size + s2*size*chunk_size + k2*chunk_size);
            }
        }
    }

    if (world_rank == root) {
        printf("Row major result: ");
        print_contiguous_matrix_array(contiguous_final_result, size);
    }

    //free!!!!!!
    if (world_rank == root) {
        free(matrixA);
        free(matrixB);
        free(final_result);
        free(contiguous_final_result);
    }

    free(matrixA_chunk);
    free(matrixB_chunk);
    free(result_chunk);

    MPI_Finalize();

    return 0;
}

提前多多感谢!

2 个答案:

答案 0 :(得分:1)

问题很可能是这两个分配:

int **rowranks = malloc(root_of_worldsize*sizeof(int));
int **columnranks = malloc(root_of_worldsize*sizeof(int));

这里你声明变量基本上是指针数组,但你没有为指针分配内存。如果int的大小小于int*的大小(通常在所有现代64位系统上),则会导致undefined behavior

答案 1 :(得分:0)

我发现了问题!这是代码的这一部分:

    double *matrixA_i = malloc(size*size*sizeof(double));
    double *matrixB_i = malloc(size*size*sizeof(double));
    double **matrixA_2d = malloc(root_of_worldsize*sizeof(double*));
    for (int i = 0; i < size; i++) {
        matrixA_2d[i] = malloc(chunk_size*chunk_size*sizeof(double));
    }
    double **matrixB_2d = malloc(root_of_worldsize*sizeof(double*));
    for (int i = 0; i < size; i++) {
        matrixB_2d[i] = malloc(chunk_size*chunk_size*sizeof(double));
    }

我没有为2D阵列分配合适的尺寸,非常感谢@Joachim Pileborg,你的答案让我走在正确的道路上,寻找什么!!