我是一般编程的新手,尤其是MPI。我正在尝试将多个阵列从根处理器分散到其他处理器,对这些阵列执行一些操作然后收集数据但它将所有数据分散到所有处理器并且输出邻接矩阵不正确所以我是假设这是因为我正在使用scatterv和/或gatherv错误。我不确定是否应该逐个元素地散布矩阵,或者是否有分散整个矩阵的方法。如果您可以查看我的代码,我们将非常感谢您的帮助。谢谢!
int rank, size;
MPI_Status status;
MPI_Datatype strip;
bool passflag[Nmats];
MPI::Init();
rank = MPI::COMM_WORLD.Get_rank();
size = MPI::COMM_WORLD.Get_size();
int sendcounts[size], recvcounts, displs[size], rcounts[size];
if(rank == root){
fin.open(infname);
fout.open(outfname);
/* INPUT ADJ-MATS */
for(i = 0; i < Nmats; i++){
fin >> dummy;
for (j = 0; j < N; j++){
for (k = 0; k < N; k++) {
fin >> a[i][j][k];
}
}
}
}
/* Nmats = Number of matrices; N = nodes; Nmats isn't divisible by the number of processors */
Nmin= Nmats/size;
Nextra = Nmats%size;
k=0;
for(i=0; i<size; i++){
if( i < Nextra) sendcounts[i] = Nmin + 1;
else sendcounts[i] = Nmin;
displs[i] = k;
k = k + sendcounts[i];
}
recvcounts = sendcounts[rank];
MPI_Type_vector(Nmin, N, N, MPI_FLOAT, &strip);
MPI_Type_commit(&strip);
MPI_Scatterv(a, sendcounts, displs, strip, a, N*N, strip, 0, MPI_COMM_WORLD);
/* Perform operations on adj-mats */
for(i=0; i<size; i++){
if(i<Nextra) rcounts[i] = Nmin + 1;
else rcounts[i] = Nextra;
displs[i] = k;
k = k + rcounts[i];
}
MPI_Gatherv(&passflag, 1, MPI::BOOL, &passflag, rcounts , displs, MPI::BOOL, 0, MPI_COMM_WORLD);
MPI::Finalize();
//OUTPUT ADJ_MATS
for(i = 0; i < Nmats; i++) if (passflag[i]) {
for(j=0;j<N; j++){
for(k=0; k<N; k++){
fout << a[i][j][k] << " ";
}
fout << endl;
}
fout << endl;
}
fout << endl;
您好我能够让代码用于静态分配,但是当我尝试动态分配代码时,代码“或多或少”破坏了。我不确定是否需要在MPI之外分配内存,或者在初始化MPI之后我应该做什么。我们欢迎所有的建议!
//int a[Nmats][N][N];
/* Prior to adding this part of the code it ran fine, now it's no longer working */
int *** a = new int**[Nmats];
for(i = 0; i < Nmats; ++i){
a[i] = new int*[N];
for(j = 0; j < N; ++j){
a[i][j] = new int[N];
for(k = 0; k < N; k++){
a[i][j][k] = 0;
}
}
}
int rank, size;
MPI_Status status;
MPI_Datatype plane;
bool passflag[Nmats];
MPI::Init();
rank = MPI::COMM_WORLD.Get_rank();
size = MPI::COMM_WORLD.Get_size();
MPI_Type_contiguous(N*N, MPI_INT, &plane);
MPI_Type_commit(&plane);
int counts[size], recvcounts, displs[size+1];
if(rank == root){
fin.open(infname);
fout.open(outfname);
/* INPUT ADJ-MATS */
for(i = 0; i < Nmats; i++){
fin >> dummy;
for (j = 0; j < N; j++){
for (k = 0; k < N; k++) {
fin >> a[i][j][k];
}
}
}
}
Nmin= Nmats/size;
Nextra = Nmats%size;
k=0;
for(i=0; i<size; i++){
if( i < Nextra) counts[i] = Nmin + 1;
else counts[i] = Nmin;
displs[i] = k;
k = k + counts[i];
}
recvcounts = counts[rank];
displs[size] = Nmats;
MPI_Scatterv(&a[displs[rank]][0][0], counts, displs, plane, &a[displs[rank]][0][0], recvcounts, plane, 0, MPI_COMM_WORLD);
/* Perform operations on matrices */
MPI_Gatherv(&passflag[displs[rank]], counts, MPI::BOOL, &passflag[displs[rank]], &counts[rank], displs, MPI::BOOL, 0, MPI_COMM_WORLD);
MPI_Type_free(&plane);
MPI::Finalize();
答案 0 :(得分:0)
您a
所拥有的内容实际上是Nmat
x N
元素的N
平面。在嵌套循环中填充其元素时对a
进行索引的方式表明这些矩阵在内存中是连续排列的。因此,您应将a
视为Nmat
元素数组,每个元素都是N*N
个化合物。您只需注册一个跨越单个矩阵内存的连续类型:
MPI_Type_contiguous(N*N, MPI_FLOAT, &plane);
MPI_Type_commit(&plane);
使用分散操作的就地模式,在根目录下不使用其他数组来分散数据:
// Perform an in-place scatter
if (rank == 0)
MPI_Scatterv(a, sendcounts, displs, plane,
MPI_IN_PLACE, 0, plane, 0, MPI_COMM_WORLD);
// ^^^^^^^^ ignored because of MPI_IN_PLACE
else
MPI_Scatterv(a, sendcounts, displs, plane,
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ignored by non-root ranks
a, sendcounts[rank], plane, 0, MPI_COMM_WORLD);
// ^^^^^^^^^^^^^^^^ !!!
请注意,每个排名必须通过提供sendcounts[]
中的相应元素(在您的代码中已修复为N*N
)来指定应接收的正确平面数。
也应该在聚集操作中使用就地模式:
if (rank == 0)
MPI_Gatherv(MPI_IN_PLACE, 0, MPI_BOOL,
// ^^^^^^^^^^^^ ignored because of MPI_IN_PLACE
passflag, rcounts, displs, MPI_BOOL, 0, MPI_COMM_WORLD);
else
MPI_Gatherv(passflag, rcounts[rank], displs, MPI_BOOL,
// ^^^^^^^^^^^^^ !!!
passflag, rcounts, displs, MPI_BOOL, 0, MPI_COMM_WORLD);
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ignored by non-root ranks
请注意,rcounts
和sendcounts
具有基本相同的值,您不必两次计算它们。只需调用数组counts
,并在MPI_Scatterv
和MPI_Gatherv
调用中同时使用它。这同样适用于displs
的值 - 不要计算它们两次,因为它们是相同的。在第二次计算之前,您似乎也没有将k
设置为零(尽管这可能不会在此处发布的代码中显示)。