Question

我正在尝试使用MPI Scatter()和Gather()函数来计算矩阵乘法，我希望能够选择矩阵大小而不必更改使用的处理量。

我已经浏览了MPI Matrix Multiplication with scatter gather和matrix multiplication using Mpi_Scatter and Mpi_Gather的帖子，但是它们都使用了以下方法：当定义了较大的矩阵大小时，这些方法不起作用，但仅当矩阵大小与进程/节点大小。

我的代码的示例矩阵大小为8：

#define MAT_SIZE 8

void initialiseMatricies(float a[][MAT_SIZE], float b[][MAT_SIZE], float c[][MAT_SIZE])
{
    int num = 11;
    for (int i = 0; i < MAT_SIZE; i++)
    {
        for (int j = 0; j < MAT_SIZE; j++)
        {
            a[i][j] = num;
            b[i][j] = num+1;
            c[i][j] = 0;
        }
        num++;
    }
}

int main(int argc, char **argv)
{   
    // MPI Variables
    int rank, size;

    // Create the main matrices with the predefined size
    float matrixA[MAT_SIZE][MAT_SIZE];
    float matrixB[MAT_SIZE][MAT_SIZE];
    float matrixC[MAT_SIZE][MAT_SIZE];

    // Create the separate arrays for storing the scattered rows from the main matrices
    float matrixARows[MAT_SIZE];
    float matrixCRows[MAT_SIZE];

    // Initialise the matrices
    initialiseMatricies(matrixA, matrixB, matrixC);

    // Start the MPI parallel sequence
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    int count = MAT_SIZE * MAT_SIZE / (size * (MAT_SIZE / size));

    // Scatter rows of first matrix to different processes
    MPI_Scatter(matrixA, count, MPI_INT, matrixARows, count, MPI_INT, 0, MPI_COMM_WORLD);

    // Broadcast second matrix to all processes
    MPI_Bcast(matrixB, MAT_SIZE * MAT_SIZE, MPI_INT, 0, MPI_COMM_WORLD);

    MPI_Barrier(MPI_COMM_WORLD);

    // Matrix Multiplication
    int sum = 0;
    for (int i = 0; i < MAT_SIZE; i++)
    {
        for (int j = 0; j < MAT_SIZE; j++)
        {
            sum += matARows[j] * matB[j][i];
        }
        matCRows[i] = sum;
    }

    // Gather the row sums from the buffer and put it in matrix C
    MPI_Gather(matrixCRows, count, MPI_INT, matrixC, count, MPI_INT, 0, MPI_COMM_WORLD);

    MPI_Barrier(MPI_COMM_WORLD);

    MPI_Finalize();

    // if it's on the master node
    if (rank == 0)
        printResults(matrixA, matrixB, matrixC, calcTime);

    return 0;
}

输出：

1364 2728 4092 5456 6820 8184 9548 10912 
1488 2976 4464 5952 7440 8928 10416 11904 
1612 3224 4836 6448 8060 9672 11284 12896 
1736 3472 5208 6944 8680 10416 12152 13888 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0

输出正确，如果将进程数设置为8（与矩阵大小相同），则可以正确计算整个矩阵，但是我不想这样做。我认为我的问题源于Scatter()和Gather()内部的计数。如果我将计数设置为：

int count = MAT_SIZE * MAT_SIZE / size;

然后输出变为：

1364 2728 4092 5456 6820 8184 9548 10912 
-1.07374e+08 -1.07374e+08 11 11 11 11 11 11 
1612 3224 4836 6448 8060 9672 11284 12896 
-1.07374e+08 -1.07374e+08 13 13 13 13 13 13 
1860 3720 5580 7440 9300 11160 13020 14880 
-1.07374e+08 -1.07374e+08 15 15 15 15 15 15 
2108 4216 6324 8432 10540 12648 14756 16864 
-1.07374e+08 -1.07374e+08 17 17 17 17 17 17

因为计数本质上从8（以前）增加到16，并且每个进程都给我一个Debug错误，提示

“运行时检查失败＃2-围绕变量'matrixC'的堆栈已损坏”

我已经改变了这一计算公式几天了，但仍然无法解决。我已经尝试过更改矩阵乘法的开始和结束迭代，但都无法解决。

Answer 1

允许设置更大的矩阵大小，单独的数组应该是2D数组，其中第1维设置为基于任务/进程数的段大小：

float matrixARows[MAT_SIZE/size][MAT_SIZE];
float matrixCRows[MAT_SIZE/size][MAT_SIZE];

计数应为：

int count = MAT_SIZE * MAT_SIZE / size;

并且矩阵乘法变为：

int sum = 0;
for (int k = 0; k < MAT_SIZE/size; k++)
{
    for (int i = 0; i < MAT_SIZE; i++)
    {
        for (int j = 0; j < MAT_SIZE; j++)
        {
            sum += matARows[k][j] * matB[j][i];
        }
        matCRows[k][i] = sum;
        sum = 0;
    }
}

注意：矩阵大小必须可被任务/进程数整除。例如。如果使用4个任务，矩阵大小必须为4、8、16、32、64、128等...

使用分散和聚集的MPI矩阵乘法

1 个答案: