使用笛卡尔拓扑进行计算以将矩阵与向量相乘。我用他们的等级和向量得到了以下过程。
P0 (process with rank = 0) =[2 , 9].
P1 (process with rank = 1) =[2 , 3]
P2 (process with rank = 2) =[1 , 9]
P3 (process with rank = 3) =[4 , 6].
现在。我需要分别对等级过程和奇数过程的元素求和,如下所示:
temp1 = [3,18]
temp2 = [6,9]
然后,将结果收集到另一个向量中,如下所示:
结果= [3,18,6,9]
我擅长这样做是使用MPI_Reduce然后像这样使用MPI_Gather:
// Previous code
double* temp1 , *temp2;
if(myrank %2 == 0){
BOOLEAN flag = Allocate_vector(&temp1 ,local_m); // function to allocate space for vectors
MPI_Reduce(local_y, temp1, local_n, MPI_DOUBLE, MPI_SUM, 0 , comm);
MPI_Gather(temp1, local_n, MPI_DOUBLE, gResult, local_n, MPI_DOUBLE,0, comm);
free(temp1);
}
else{
Allocate_vector(&temp2 ,local_m);
MPI_Reduce(local_y, temp2, local_n , MPI_DOUBLE, MPI_SUM, 0 , comm);
MPI_Gather(temp2, local_n, MPI_DOUBLE, gResult, local_n, MPI_DOUBLE, 0,comm);
free(temp2);
}
但答案是不正确的。似乎代码将偶数和奇数进程的所有元素相加以收集然后给出分段错误错误: Wrong_result = [21 15 0 0] 而这个错误
**
./test': double free or corruption (fasttop): 0x00000000013c7510 *** *** Error in
./ test':双重免费或损坏(fasttop)错误:0x0000000001605b60 ***
答案 0 :(得分:1)
它不会像你想要的那样工作。要对进程子集的元素执行简化,必须为它们创建子通信器。在您的情况下,奇数和偶数进程共享相同的comm
,因此操作不是在两个单独的进程组上,而是在组合组上。
您应该使用MPI_Comm_split
执行拆分,使用两个新的子通信程序执行减少,最后在每个子通信器中使用等级0(让我们称之为 leader )参与聚合在另一个仅包含这两个的子通信器上:
// Make sure rank is set accordingly
MPI_Comm_rank(comm, &rank);
// Split even and odd ranks in separate subcommunicators
MPI_Comm subcomm;
MPI_Comm_split(comm, rank % 2, 0, &subcomm);
// Perform the reduction in each separate group
double *temp;
Allocate_vector(&temp, local_n);
MPI_Reduce(local_y, temp, local_n , MPI_DOUBLE, MPI_SUM, 0, subcomm);
// Find out our rank in subcomm
int subrank;
MPI_Comm_rank(subcomm, &subrank);
// At this point, we no longer need subcomm. Free it and reuse the variable.
MPI_Comm_free(&subcomm);
// Separate both group leaders (rank 0) into their own subcommunicator
MPI_Comm_split(comm, subrank == 0 ? 0 : MPI_UNDEFINED, 0, &subcomm);
if (subcomm != MPI_COMM_NULL) {
MPI_Gather(temp, local_n, MPI_DOUBLE, gResult, local_n, MPI_DOUBLE, 0, subcomm);
MPI_Comm_free(&subcomm);
}
// Free resources
free(temp);
结果将在后者gResult
中排名为subcomm
的{{1}}中,由于执行拆分的方式,因此恰好在comm
中排名为0。
我想这并不像预期的那么简单,但这是在MPI中进行方便的集体操作的代价。
在旁边节点上,在显示的代码中,您将temp1
和temp2
分配为长度为local_m
,而在所有集合调用中,长度指定为local_n
}。如果它发生local_n > local_m
,则会发生堆损坏。