我希望实现并行化代码,将每个处理器的子矩阵收集到主处理器中的矩阵中。
例如,我想要实现的是这样的:
MPI_Datatype rowtype_temp, rowtype, coltype_temp, coltype, mtype_temp, mtype;
double **f, **f_local;
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
local_n = n / comm_sz;
MPI_Type_vector(n, 1, local_n, MPI_DOUBLE, &rowtype_temp);
MPI_Type_commit(&rowtype_temp);
MPI_Type_create_resized(rowtype_temp, 0, sizeof(double), &rowtype);
MPI_Type_commit(&rowtype);
MPI_Type_vector(n, 1, n, MPI_DOUBLE, &coltype_temp);
MPI_Type_commit(&coltype_temp);
MPI_Type_create_resized(coltype_temp, 0, sizeof(double), &coltype);
MPI_Type_commit(&coltype);
MPI_Type_vector(local_n, 1, comm_sz, coltype, &mtype_temp);
MPI_Type_commit(&mtype_temp);
MPI_Type_create_resized(mtype_temp, 0, sizeof(double), &mtype);
MPI_Type_commit(&mtype);
f_local = (double**) malloc (n * sizeof(double *));
for (i = 0; i < n; i++) f_local[i] = (double *) malloc (local_n * sizeof(double));
f = (double**) malloc (n * sizeof(double *));
for (i = 0; i < n ; i++) f[i] = (double *) malloc (n * sizeof(double));
MPI_Gather(&f_local[0][0], local_n, rowtype, &f[0][0], 1 , mtype, 0, MPI_COMM_WORLD);
这是我的代码的一部分。
processor 0 : a b a b
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
,此代码中的输出类似于
getItem