Question

我正在研究一个问题，我需要在MPI中通过将矩阵之一的列切割到不同的处理器来进行矩阵乘法。 A * B = C，B要切片

我执行以下操作：

MPI_Bcast(A, n*n, MPI_DOUBLE, 0, MPI_COMM_WORLD);其中A被分配给所有等级

MPI_Type_vector(n, n/p, n, MPI_DOUBLE, &tmp_type);
MPI_Type_create_resized(tmp_type, 0, n/p*sizeof(double), &col_type);
MPI_Type_commit(&col_type);

where n-size of A & B and p-no. of processors

MPI_Scatter(B, 1, col_type, b, n/p*n, MPI_DOUBLE, 0, MPI_COMM_WORLD); 其中B仅在根上分配，b在所有等级上分配

cblas_dgemm( CblasRowMajor, CblasNoTrans, CblasNoTrans, n, n/p, n, 1.0, A, n, b, n/p, 0.0, c, n/p ); 其中c分配给所有级别（由BLAS例程完成的每个处理器上的乘法）

MPI_Gather(c, n/p*n, MPI_DOUBLE, C, 1, col_type, 0, MPI_COMM_WORLD); 其中C仅在根

上分配

我的代码确实为小矩阵（大小＆lt; 62）执行了所需的操作。但是对于大于此的矩阵，它会失败，并给出以下错误：

[csicluster01:12280] *** An error occurred in MPI_Gather
[csicluster01:12280] *** on communicator MPI_COMM_WORLD
[csicluster01:12280] *** MPI_ERR_TRUNCATE: message truncated
[csicluster01:12280] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 12280 on
node csicluster01 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

这里是否有任何明显的错误我无法弄清楚？或者问题是否可能是由于使用处理器的某些问题？

Answer 1

问题可能来自

MPI_Type_create_resized(tmp_type, 0, n/p*sizeof(double), &col_type);

您可能需要更改

MPI_Type_create_resized(tmp_type, 0, n/p*n*sizeof(double), &col_type);

由于您执行

之类的操作，因此此更改似乎是合乎逻辑的

MPI_Scatter(B, 1, col_type, b, n/p*n, MPI_DOUBLE, 0, MPI_COMM_WORLD);

但我不知道此更改是否解决了您的列分区问题。它可能会解决MPI_Gather()触发错误的问题。

再见，

MPI中的矩阵乘法：第二矩阵的逐列分割

1 个答案: