MPI非阻塞通信的循环在一定数量的节点上失败

时间:2015-10-08 21:40:52

标签: c mpi nonblocking

我正在编写一段代码,可以在圆圈中的节点周围传递数据,就像rank0-> rank1 rank1-> rank2 rank2-> rank0一样。

为了测试带宽,添加for循环以进行数百次非阻塞通信。

当我使用5个或更少的节点时,代码可以工作。但是在超过5个节点的情况下,它会失败。

int rank, npes; MPI_Comm_size(MPI_COMM_WORLD,&npes); MPI_Comm_rank(MPI_COMM_WORLD,&rank); float* recv_buffer_last = (float*)malloc(sizeof(float)*YMAX*FMAX); float* recv_buffer_next = (float*)malloc(sizeof(float)*YMAX*FMAX); float* send_buffer_next = (float*)malloc(sizeof(float)*YMAX*FMAX); float* send_buffer_last = (float*)malloc(sizeof(float)*YMAX*FMAX); if(npes >1) { MPI_Status reqstat; MPI_Request send_request; MPI_Request recv_request; for(int loop = 0; loop < 100; loop++) //if no loop here,code always works. { MPI_Irecv(recv_buffer_last,YMAX*FMAX,MPI_FLOAT,(rank == 0)?(npes - 1):(rank - 1),100,MPI_COMM_WORLD,&recv_request); MPI_Irecv(recv_buffer_next,YMAX*FMAX,MPI_FLOAT,(rank == npes-1)?0:rank+1,1000,MPI_COMM_WORLD,&recv_request); MPI_Isend(send_buffer_next,YMAX*FMAX,MPI_FLOAT,(rank == npes -1)?0:rank+1,100,MPI_COMM_WORLD,&send_request); MPI_Isend(send_buffer_last,YMAX*FMAX,MPI_FLOAT,(rank ==0)?(npes - 1):(rank - 1),1000,MPI_COMM_WORLD,&send_request); MPI_Waitall(1,&recv_request,&reqstat); MPI_Waitall(1,&send_request,&reqstat); } } else { memcpy(recv_buffer_last,send_buffer_next,sizeof(float)*YMAX*FMAX); memcpy(recv_buffer_next,send_buffer_last,sizeof(float)*YMAX*FMAX); }

但是,如果我注释掉循环行,只发送和接收一次数据,则无论节点数如何,代码都能正常工作。我真的不知道哪里出了问题。

以下是代码。

int rank, npes;
MPI_Comm_size(MPI_COMM_WORLD,&npes);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);

/*
float* recv_buffer_last = (float*)malloc(sizeof(float)*YMAX*FMAX);
float* recv_buffer_next = (float*)malloc(sizeof(float)*YMAX*FMAX);
float* send_buffer_next = (float*)malloc(sizeof(float)*YMAX*FMAX);
float* send_buffer_last = (float*)malloc(sizeof(float)*YMAX*FMAX);
*/

float recv_buffer_last[FMAX*YMAX];
float recv_buffer_next[FMAX*YMAX];
float send_buffer_last[FMAX*YMAX];
float send_buffer_next[FMAX*YMAX];

int prev = (rank+npes-1)%npes;
int next = (rank+1)%npes;

if(npes >1)
    {
      MPI_Request requests[4];
      for(int loop = 0; loop < 100; loop++)
       {
        MPI_Irecv(recv_buffer_last,YMAX*FMAX,MPI_FLOAT,prev,100,MPI_COMM_WORLD,&requests[0]);
        MPI_Irecv(recv_buffer_next,YMAX*FMAX,MPI_FLOAT,next,1000,MPI_COMM_WORLD,&requests[1]);
        MPI_Isend(send_buffer_next,YMAX*FMAX,MPI_FLOAT,next,100,MPI_COMM_WORLD,&requests[2]);
        MPI_Isend(send_buffer_last,YMAX*FMAX,MPI_FLOAT,prev,1000,MPI_COMM_WORLD,&requests[3]);
        MPI_Waitall(4,requests,MPI_STATUSES_IGNORE);


        }
    }
  else
    {
      memcpy(recv_buffer_last,send_buffer_next,sizeof(float)*YMAX*FMAX);
      memcpy(recv_buffer_next,send_buffer_last,sizeof(float)*YMAX*FMAX);
    }

更新

感谢大家指出MPI_Waitall的错误,但进行修改并不能解决我的问题。

在玩了我的代码之后,我发现将动态数组recv_buffer和send_buffer更改为静态数组,使代码完美运行。

{{1}}

我想知道是什么造成了不同,以便我可以避免将来出现类似问题。

1 个答案:

答案 0 :(得分:1)

我很确定您的问题的根源在于您有悬挂请求,在通信结束时您不会等待,因为您有2 MPI_Irecv()和2 MPI_Isend()每个循环,所以发布了4个请求,但只有2个请求等待。这意味着在内部,MPI库将为这些请求分配资源,这些资源永远不会被释放,从而导致达到一些内部限制和您遇到的错误。

以下是您的代码的示例:

int rank, npes;
MPI_Comm_size( MPI_COMM_WORLD, &npes );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );

int len = YMAX*FMAX;
float *recv_buffer_last = (float*) malloc( sizeof( float ) * len );
float *recv_buffer_next = (float*) malloc( sizeof( float ) * len );
float *send_buffer_next = (float*) malloc( sizeof( float ) * len );
float *send_buffer_last = (float*) malloc( sizeof( float ) * len );

if ( npes > 1 ) {
    MPI_Request requests[4];
    int prev = ( rank + npes - 1 ) % npes;
    int next = ( rank + 1 ) % npes;
    for ( int loop = 0; loop < 100; loop++ ) {
        MPI_Irecv( recv_buffer_last, len, MPI_FLOAT, prev,  100, MPI_COMM_WORLD, &requests[0] );
        MPI_Irecv( recv_buffer_next, len, MPI_FLOAT, next, 1000, MPI_COMM_WORLD, &requests[1] );
        MPI_Isend( send_buffer_next, len, MPI_FLOAT, next,  100, MPI_COMM_WORLD, &requests[2] );
        MPI_Isend( send_buffer_last, len, MPI_FLOAT, prev, 1000, MPI_COMM_WORLD, &requests[3] );
        MPI_Waitall( 4, requests, MPI_STATUSES_IGNORE );
    }
}
else {
    memcpy( recv_buffer_last, send_buffer_next, sizeof( float ) * len );
    memcpy( recv_buffer_next, send_buffer_last, sizeof( float ) * len);
}