我正在尝试实现一个MPI程序,迭代地将数组中的每个元素设置为自身及其邻居的平均值(在前一个时间步长上),同时保持第一个和最后一个元素的含量。对于一个过程,这很好;但是,对于多个进程,我没有得到正确的答案,特别是第一个数组元素总是被覆盖。
我的初始化步骤似乎正常工作,至少就“计算前”输出而言,无论使用的进程数是1还是更多,它都会打印相同的向量。
我不确定的一件事是我是否正确使用MPI_Request和MPI_Status;要注意的变量是sendL
,sendR
和status
。
我试图只包含相关的代码部分; “......”表示缺少某些东西。其中一些省略号有注释来解释删除的内容。并行和单进程实现都用于比较。
...
#include "mpi.h"
... //definition of f() for initialization
int main(int argc, char **argv) {
int id, p, i, j, k, n, t, m, v, vp,
lbound, ubound, block_size, offset;
double startwtime, endwtime;
float time;
MPI_Request *sendL, *sendR;
MPI_Status *status; /* return status for receive */
double *prev, *cur, *temp;
... // initialize MPI; get PE rank and size
.... // set the following:
// n = vector length, m = num iterations, k = buffer size
// v = verbose (true/false)
// Memory allocation for output from MPI functions
// Note that I never actually initialized these. Is this a problem?
sendL = (MPI_Request *) malloc(sizeof(MPI_Request));
sendR = (MPI_Request *) malloc(sizeof(MPI_Request));
status = (MPI_Status *) malloc(sizeof(MPI_Status));
// Memory allocation for data array.
block_size = (n/p+2*k);
prev = (double *) malloc( sizeof(double) * block_size);
cur = (double *) malloc( sizeof(double) * block_size);
... //malloc error handling
t = 0;
/* The following block is for a single process. It works correctly. */
if(p==1){
// Initialization
startwtime = MPI_Wtime();
for(i=0;i<n;i++) prev[i] = f(i,n);
cur[0] = f(0,n); cur[n-1] = f(n-1,n);
if(v){
printf("Before calculation\n");
for(i=0;i<n;i++) printf("%f ",prev[i]);
printf("\n");
}
while (t < m) {
for ( i=1 ; i < n-1 ; i++ ) {
cur[i] = (prev[i-1]+prev[i]+prev[i+1])/3;
}
temp = prev; prev = cur; cur = temp; t++;
}
if(v){
printf("After calculation:\n");
for(i=0;i<n;i++) printf("%f ",prev[i]);
printf("\n");
}
endwtime = MPI_Wtime();
time = endwtime-startwtime;
printf("Sequential process complete, time: %f\n", time);
return MPI_Finalize();
}
/* Here is my parallel implementation. It has problems. */
else{
if (id == 0){
startwtime = MPI_Wtime();
}
// Initialization
offset = id*(n/p)-k;
for(i=0;i<block_size;i++) prev[i] = f(i+offset,n);
cur[0] = f(0,n); cur[block_size-1] = prev[block_size-1];
if (id == 0){
for (i=0;i<k;i++){
prev[i] = f(0,n);
cur[i] = prev[i];
}
}
if (id == p-1){
for (i=block_size-k;i<block_size;i++){
prev[i] = f(n-1,n);
cur[i] = prev[i];
}
}
if(v && id == 0){
printf("Before calculation:\n");
for(j=k;j<(n/p)+k;j++) printf("%f ",prev[j]);
for(i=1;i<p;i++){
MPI_Recv(prev+k,(n/p),MPI_DOUBLE_PRECISION,i,2,MPI_COMM_WORLD,status);
for(j=k;j<(n/p)+k;j++) printf("%f ",prev[j]);
}
printf("\n");
}
else if (v){
MPI_Isend(prev+k,(n/p),MPI_DOUBLE_PRECISION,0,2,MPI_COMM_WORLD,sendL);
}
lbound = (id == 0) ? (k+1) : (1);
ubound = (id == p-1) ? (block_size-k-2) : (block_size-2);
while (t < m) {
for ( i=lbound ; i < ubound ; i++ ) {
cur[i] = (prev[i-1]+prev[i]+prev[i+1])/3;
}
temp = prev; prev = cur; cur = temp; t++;
if (t%k == 0){
if (id > 0){
// send to left
MPI_Isend(prev+k,k,MPI_DOUBLE_PRECISION,id-1,0,MPI_COMM_WORLD,sendL);
}
if (id < p-1) {
// send to right
MPI_Isend(prev+block_size-2*k,k,
MPI_DOUBLE_PRECISION,id+1,1,MPI_COMM_WORLD,sendR);
}
if (id < p-1){
// receive from right
MPI_Recv(prev+block_size-k,k,
MPI_DOUBLE_PRECISION,id+1,0,MPI_COMM_WORLD,status);
}
if (id > 0) {
// receive from left
MPI_Recv(prev,k,MPI_DOUBLE_PRECISION,id-1,1,MPI_COMM_WORLD,status);
}
}
}
if(v && id == 0){
printf("After calculation\n");
for(j=k;j<(n/p)+k;j++) printf("%f ",prev[j]);
for(i=1;i<p;i++){
MPI_Recv(prev+k,(n/p),MPI_DOUBLE_PRECISION,i,2,MPI_COMM_WORLD,status);
for(j=k;j<(n/p)+k;j++) printf("%f ",prev[j]);
}
printf("\n");
}
else if (v){
MPI_Isend(prev+k,(n/p),MPI_DOUBLE_PRECISION,0,2,MPI_COMM_WORLD,sendL);
}
if (id == 0){
endwtime = MPI_Wtime();
time = endwtime-startwtime;
printf("Process 0 complete, time: %f\n", time);
}
return MPI_Finalize();
}
}
答案 0 :(得分:2)
首先要做的事情。这部分代码过于复杂:
MPI_Request *sendL, *sendR;
MPI_Status *status; /* return status for receive */
sendL = (MPI_Request *) malloc(sizeof(MPI_Request));
sendR = (MPI_Request *) malloc(sizeof(MPI_Request));
status = (MPI_Status *) malloc(sizeof(MPI_Status));
MPI中的句柄是简单类型,如整数或指针。在这种情况下,动态分配毫无意义。状态也是一个包含3-4个字段的简单结构,将它分配到堆上是没有意义的。改为使用堆栈变量:
MPI_Request sendL, sendR;
MPI_Status status;
还有另一个问题:您启动非阻止发送但从不保证其完成,即您永远不会在请求句柄上调用MPI_Wait
或MPI_Test
。它们可能永远不会真正进展到完成,这可能导致接收代码死锁。实际上,您根本不需要这些非阻止调用,而是使用MPI_Sendrecv
,这是专为您使用MPI_Isend
/ MPI_Recv
的组合而设计的。以下代码:
if (id > 0){
// send to left
MPI_Isend(prev+k,k,MPI_DOUBLE_PRECISION,id-1,0,MPI_COMM_WORLD,sendL);
}
if (id < p-1) {
// send to right
MPI_Isend(prev+block_size-2*k,k,
MPI_DOUBLE_PRECISION,id+1,1,MPI_COMM_WORLD,sendR);
}
if (id < p-1){
// receive from right
MPI_Recv(prev+block_size-k,k,
MPI_DOUBLE_PRECISION,id+1,0,MPI_COMM_WORLD,status);
}
if (id > 0) {
// receive from left
MPI_Recv(prev,k,MPI_DOUBLE_PRECISION,id-1,1,MPI_COMM_WORLD,status);
}
可以替换为:
int prev_rank, next_rank;
prev_rank = (id > 0) ? id-1 : MPI_PROC_NULL;
next_rank = (id < p-1) ? id+1 : MPI_PROC_NULL;
...
MPI_Sendrecv(prev+k, k, MPI_DOUBLE, prev_rank, 0,
prev+block_size-k, k, MPI_DOUBLE, next_rank, 0, MPI_COMM_WORLD, &status);
MPI_Sendrecv(prev+block_size-2*k, k, MPI_DOUBLE, next_rank, 1,
prev, k, MPI_DOUBLE, prev_rank, 1, MPI_COMM_WORLD, &status);
使用空进程的概念删除排名检查,这是一个排名为MPI_PROC_NULL
的进程。它在MPI中是一个非常特殊的级别 - 您可以随时发送和接收消息,这些操作只是无操作。请注意,正确的MPI数据类型为MPI_DOUBLE
。 MPI_DOUBLE_PRECISION
用于Fortran数据类型DOUBLE PRECISION
。因为MPI_Sendrecv
是一个阻塞调用,所以每次调用都会被写入,以便在从前一个进程接收数据时将数据发送到下一个进程,以防止死锁。
答案 1 :(得分:0)
“计算前”输出会覆盖prev
指针。糟糕。