Question

我对这个问题感到非常困难：使用MPI，我想将分布在多个MPI进程中的二维数组的几个连续的，非重叠的柱状块组合成一个驻留在根进程的数组。主要条件是所有发送和接收进程的数组必须相同。第二个条件是每个进程发送的柱状块可以具有不同的宽度。这似乎是并行编程中的常见问题，因为我在StackOverflow中发现了至少6个与此问题相关的问题。不幸的是，没有一个答案对我有帮助。当我将问题分成行块而不是列时，我可以很好地解决这个项目。我意识到这与柱状子阵列的不同步骤有关。我尝试过MPI矢量和子阵列类型，都无济于事。

使用我的代码的简化版本，如果我使用COLUMNS等于6执行它，我得到：

    0:  1  1  1  2  2  2  
    1:  1  1  1  2  2  2  
    2:  1  1  1  2  2  2  
    3:  1  1  1  2  2  2  
    4:  1  1  1  2  2  2  
    5:  1  1  1  2  2  2  
    6:  1  1  1  2  2  2

这就是我想要的。

另一方面，如果我用COLUMNS = 5执行它，我希望得到：

    0:  1  1  1  2  2
    1:  1  1  1  2  2
    2:  1  1  1  2  2
    3:  1  1  1  2  2
    4:  1  1  1  2  2
    5:  1  1  1  2  2
    6:  1  1  1  2  2

相反，我得到：

    0:  1  1  1  2  2
    1:  2  1  1  2  2
    2:  2  1  1  2  2
    3:  2  1  1  2  2
    4:  2  1  1  2  2
    5:  1  1  1 -0 -0
    6:  1  1  1 -0 -0

简化代码的列表：

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

#define ROWS                    7
#define COLUMNS                 6 // 5 or 6 only. I could pass this in the cmd line...
#define NR_OF_PROCESSES         2

void print_matrix (float ** X, int rows, int cols)
{
    for (int i = 0; i < rows; ++i) {
        printf ("%3d: ", i);
        for (int j = 0; j < cols; ++j)
            printf ("%2.0f ", X[i][j]);
        printf ("\n");
    }
}

float **allocate_matrix (int rows, int cols)
{
    float  *data   = (float  *) malloc (rows * cols * sizeof(float));
    float **matrix = (float **) malloc (rows * sizeof(float *));
    for (int i = 0; i < rows; i++)
        matrix[i] = & (data[i * cols]);
    return matrix;
}

int main (int argc, char *argv[])
{
    int   num_procs, my_rank, i, j, root = 0, ncols, ndims = 2, strts;
    float **matrix;
    MPI_Datatype sendsubarray, recvsubarray, resizedrecvsubarray;

    assert (COLUMNS == 5 || COLUMNS == 6);

    MPI_Init (&argc, &argv);
    MPI_Comm_size (MPI_COMM_WORLD, &num_procs);
    if (num_procs != NR_OF_PROCESSES) MPI_Abort (MPI_COMM_WORLD, -1);
    MPI_Comm_rank (MPI_COMM_WORLD, &my_rank);

    ncols = (my_rank == root) ? 3 : COLUMNS - 3;
    strts = (my_rank == root) ? 0 : 3;
    int sizes[2]    = {ROWS, COLUMNS};
    int subsizes[2] = {ROWS, ncols};
    int starts[2]   = {0, strts};

    // Create and populate the matrix at each node (incl. the root):
    matrix = allocate_matrix (ROWS, COLUMNS);
    for (i = 0; i < ROWS; i++)
        for (j = 0; j < COLUMNS; j++)
            matrix[i][j] = my_rank * -1.0;
    for (i = starts[0]; i < starts[0] + subsizes[0]; i++)
        for (j = starts[1]; j < starts[1] + subsizes[1]; j++)
            matrix[i][j] = my_rank + 1.0;

    // Create the subarray type for use by each send node (incl. the root):
    MPI_Type_create_subarray (ndims, sizes, subsizes, starts, MPI_ORDER_C,
                              MPI_FLOAT, &sendsubarray);
    MPI_Type_commit (&sendsubarray);

    // Create the subarray type for use by the receive node (the root):
    if (my_rank == root) {
        MPI_Type_create_subarray (ndims, sizes, subsizes, starts, MPI_ORDER_C,
                                  MPI_FLOAT, &recvsubarray);
        MPI_Type_commit (&recvsubarray);
        MPI_Type_create_resized (recvsubarray, 0, 1 * sizeof(float),
                                 &resizedrecvsubarray);
        MPI_Type_commit (&resizedrecvsubarray);
    }

    // Gather the send matrices into the receive matrix:
    int counts[NR_OF_PROCESSES] = {3, COLUMNS - 3};
    int displs[NR_OF_PROCESSES] = {0, 3};
    MPI_Gatherv (matrix[0], 1, sendsubarray,
                 matrix[0], counts, displs, resizedrecvsubarray,
                 root, MPI_COMM_WORLD);

    // Have the root send the main array to the output:
    if (my_rank == root) print_matrix (matrix, ROWS, COLUMNS);

    // Free out all the allocations we created in this node...
    if (my_rank == 0) {
        MPI_Type_free (&resizedrecvsubarray);
        MPI_Type_free (&recvsubarray);
    }
    MPI_Type_free (&sendsubarray);
    free (matrix);

    MPI_Finalize();
    return 0;
}

我在想，也许没有直接解决我的小问题，如上面的代码所示，因此我将不得不解决一些复杂的多步解决方案，我必须在单独处理不同的宽度子阵列在两步或三步中将它们收集到接收阵列之前的方式，而不仅仅是一个。

非常感谢任何帮助！

Answer 1

很好！那里有很多关于MPI细节的杂耍，最后只有一个缺失 - 我只需要添加两行并更改第三行以使代码工作。

即使在输出错误的情况下，你大部分工作的事实都证明了这一点。正在接收正确数量的“2”，因此您正在构建发送类型并正确发送数据。唯一的诀窍在于接收。

来自Gatherv代码，

int counts[NR_OF_PROCESSES] = {3, COLUMNS - 3};
int displs[NR_OF_PROCESSES] = {0, 3};

你已经正确地决定以列为单位接收（因此第一列有3列要发送，第二列是其余的）;鉴于你的大小调整，你的位移是有意义的;你已经以数组元素为单位调整了大小，所以每一列都紧跟在下一个之后。

唯一的障碍在于您的接收子阵列类型构造;当你打这个电话时

    MPI_Type_create_subarray (ndims, sizes, subsizes, starts, MPI_ORDER_C,
                              MPI_FLOAT, &recvsubarray);

您正在创建一个接收类型，对于接收过程，它是发送数据的大小，子大小和偏移量！相反，你只想创建一个恰好一列的接收子阵列类型，并且开头为{0,0} - 所以没有（内在的）偏移，所以你可以指出它需要去位移的位置：

    int colsubsizes[]={ROWS, 1};
    int colstarts[]={0,0};
    MPI_Type_create_subarray (ndims, sizes, colsubsizes, colstarts, MPI_ORDER_C,
                              MPI_FLOAT, &recvsubarray);

当我用它运行时，它可以工作。

（作为（更多）次要注释，您不需要提交，或因此免费，recvsubarray，因为您从未将其用于实际通信;它仅用于构建{{1} } type，然后提交。）

如何在MPI中仅使用一个阵列组合发送和接收不同宽度的子阵列

1 个答案: