我在使用MPI_Gather
将索引整数收集到整数向量时遇到了问题。当我尝试在不创建新接收类型的情况下收集整数时,出现MPI_ERR_TRUNCATE
错误。
*** An error occurred in MPI_Gather
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_TRUNCATE: message truncated
*** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
复制问题的最小示例是
#include <stdlib.h>
#include "mpi.h"
int i, comm_rank, comm_size, err;
int *send_data, *recv_data;
int *blocklengths, *displacements;
MPI_Datatype send_type;
int main ( int argc, char *argv[] ){
MPI_Init ( &argc, &argv );
MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
unsigned int block = 1000;
unsigned int count = 1000;
send_data = malloc(sizeof(int)*block*count);
for (i=0; i<block*count; ++i) send_data[i] = i;
recv_data = 0;
if(comm_rank==0) recv_data = malloc(sizeof(int)*block*count*comm_size);
blocklengths = malloc(sizeof(int)*count);
displacements = malloc(sizeof(int)*count);
for (i=0; i<count; ++i) {
blocklengths[i] = block;
displacements[i] = i*block;
}
MPI_Type_indexed(count, blocklengths, displacements, MPI_INT, &send_type);
MPI_Type_commit(&send_type);
err = MPI_Gather((void*)send_data, 1, send_type, (void*)recv_data, block*count, MPI_INT, 0, MPI_COMM_WORLD);
if (err) MPI_Abort(MPI_COMM_WORLD, err);
free(send_data);
free(recv_data);
free(blocklengths);
free(displacements);
MPI_Finalize ( );
return 0;
}
我注意到当我使用小于6K字节的数据传输时,不会发生此错误。
我找到了使用MPI_Type_contiguous
的解决方法,虽然看起来我为代码添加了额外的开销。
MPI_Type_contiguous(block*count, MPI_INT, &recv_type);
MPI_Type_commit(&recv_type);
err = MPI_Gather((void*)send_data, 1, send_type, (void*)recv_data, 1, recv_type, 0, MPI_COMM_WORLD);
我已验证open-mpi
v1.6和v1.8。
有人能解释这个问题的根源吗?