Question

我正在编写一个程序，它将存储在文本文件中的两个矩阵A和B相乘，并且哪个大小可能是变体的，所以我的程序必须识别矩阵A和B的大小，确定它们是否可以相乘等

当我将数据从主进程传递到从进程时，真正的问题不是问题，在我的程序中，我将行从主进程传递给从进程，行数取决于矩阵的行数和进程数。

矩阵A按行存储，但矩阵B按列存储。

matrixA [0] ----------------

matrixA [1] ----------------

matrixA [2] ----------------

matrixB [0] matrixB [1] matrixB [2] .........
|           |         |     |
|           |         |     |
|           |         |     |    

您可以在此处找到文本文件（输入内容）：matrixA matrixB。

经过几天80的样式调试（完全不是调试器），我认为问题（我得到的分段错误是输出）是在这些代码行中（来自slave函数）：

void slave( int id, int slaves, double **matrixA, double **matrixB, double **matrixC )
{
    int type, columnsA, columnsB, rowsA, rowsB, Btype, offset, rows, averageRows, extraRows;
    MPI_Status status;

    /* Recieves columns of A and B from master. */
    type = 3;

    MPI_Recv( &columnsA, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &rowsA, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &columnsB, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &rowsB, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    printf( "%d slave recieved ColumnA = %d, RowsA = %d, ColumnB = %d, RowsB = %d.\n", id, columnsA, rowsA, columnsB, rowsB );


    /* Recieve from master. */
    type = 0;

    MPI_Recv( &offset, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &rows, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );

    matrixAllocate( &matrixA, columnsA, rows );
    matrixAllocate( &matrixB, rowsB, columnsB );
    matrixAllocate( &matrixC, columnsB, rows );
    printf( "Correctly allocated.\n" );

    /* This part is only to see if the mem was correctly allocated.*/
    for( int i = 0; i < rows; i++ ){
        for( int j = 0; j < columnsA; j++)
            matrixA[ i ][ j ] = i + j;
    }

    for( int i = 0; i < columnsB; i++ ){
        for( int j = 0; j < rowsB; j++)
            matrixB[ i ][ j ] = i * j;
    }

    if ( id == 1 ){
        matrixPrinter( "matrixA", matrixA, rows, columnsA );
        matrixBPrinter( "matrixB", matrixB, rowsB, columnsB );
        matrixPrinter( "matrixC", matrixC, rows, columnsB );
    }

    MPI_Recv( &matrixA, ( rows * columnsA ) , MPI_DOUBLE, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &matrixB, ( rowsB * columnsB ), MPI_DOUBLE, 0, type, MPI_COMM_WORLD, &status );
    printf( "Correctly recieved.\n" );

    matrixPrinter( "matrixA", matrixA, rows, columnsA );
    matrixBPrinter( "matrixB", matrixB, rowsB, columnsB );
    matrixPrinter( "matrixC", matrixC, rows, columnsB );

    if ( id == 1 ){
        printf( "My id is %d.\n", id );
        for ( int i = 0; i < rows; i++ ){
            for( int j = 0; j < columnsA; j++ ){
                printf( "%lf    ", matrixA[ i ][ j ] );
            }
        printf( "\n" );
    }
}

可以在此处找到整个代码。 MPI matrix multiplier in C.

终端的输出为：

enter image description here

Answer 1

问题是，矩阵的类型为“double **”，如“matrixAllocate”中所分配。在发送和接收数据时，MPI假定buf包含数据连续作为1-d数组，但情况并非如此。（您可以通过打印出每个矩阵条目的地址轻松检查）

我认为这是C中一个着名的陷阱：指针和数组是不同的。如果矩阵是二维数组，那么所有条目都是连续排列的。

我的建议是将矩阵分配为1-d，不要使用multidim下标。

Answer 2

如果不仔细阅读所有MPI代码，我讨厌发布这样的答案，但我建议将来使用编译器命令-Wall。它可能会有所帮助并且会发现这样的错误。对于MPI和任何计算相关的东西，你几乎总是需要-Wall编译器命令

查看代码中的输出和警告列表。

$ mpic++ test.cpp -Wall -o  test
test.cpp:30:63: warning: unused variable 'rank' [-Wunused-variable]
    int lineA, lineB, columnA, columnB, id, size, rc, slaves, rank, source;
                                                              ^
test.cpp:30:69: warning: unused variable 'source' [-Wunused-variable]
    int lineA, lineB, columnA, columnB, id, size, rc, slaves, rank, source;
                                                                    ^
test.cpp:126:50: warning: variable 'matrixC' is uninitialized when used here [-Wuninitialized]
            slave( id, slaves, matrixA, matrixB, matrixC );
                                                 ^~~~~~~
test.cpp:34:21: note: initialize the variable 'matrixC' to silence this warning
           **matrixC;
                    ^
                     = NULL
test.cpp:126:41: warning: variable 'matrixB' is uninitialized when used here [-Wuninitialized]
            slave( id, slaves, matrixA, matrixB, matrixC );
                                        ^~~~~~~
test.cpp:33:21: note: initialize the variable 'matrixB' to silence this warning
           **matrixB,
                    ^
                     = NULL
test.cpp:85:44: warning: variable 'rc' is uninitialized when used here [-Wuninitialized]
                MPI_Abort( MPI_COMM_WORLD, rc );
                                           ^~
test.cpp:30:53: note: initialize the variable 'rc' to silence this warning
    int lineA, lineB, columnA, columnB, id, size, rc, slaves, rank, source;
                                                    ^
                                                     = 0
test.cpp:126:32: warning: variable 'matrixA' is uninitialized when used here [-Wuninitialized]
            slave( id, slaves, matrixA, matrixB, matrixC );
                               ^~~~~~~
test.cpp:32:21: note: initialize the variable 'matrixA' to silence this warning
    double **matrixA,
                    ^
                     = NULL
test.cpp:398:20: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixPrinter( "matrixA", matrixA, rows, columnsA );
                   ^
test.cpp:399:21: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixBPrinter( "matrixB", matrixB, rowsB, columnsB );
                    ^
test.cpp:400:20: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixPrinter( "matrixC", matrixC, rows, columnsB );
                   ^
test.cpp:407:20: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixPrinter( "matrixA", matrixA, rows, columnsA );
                   ^
test.cpp:408:21: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixBPrinter( "matrixB", matrixB, rowsB, columnsB );
                    ^
test.cpp:409:20: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixPrinter( "matrixC", matrixC, rows, columnsB );
                   ^
test.cpp:363:70: warning: unused variable 'averageRows' [-Wunused-variable]
    int type, columnsA, columnsB, rowsA, rowsB, Btype, offset, rows, averageRows, extraRows;
                                                                     ^
test.cpp:363:83: warning: unused variable 'extraRows' [-Wunused-variable]
    int type, columnsA, columnsB, rowsA, rowsB, Btype, offset, rows, averageRows, extraRows;
                                                                                  ^
test.cpp:363:49: warning: unused variable 'Btype' [-Wunused-variable]
    int type, columnsA, columnsB, rowsA, rowsB, Btype, offset, rows, averageRows, extraRows;
                                                ^
15 warnings generated.

分段错误，使用MPI库乘以矩阵

2 个答案: