我正在编写一段代码,可以在圆圈中的节点周围传递数据,就像rank0-> rank1 rank1-> rank2 rank2-> rank0一样。
为了测试带宽,添加for循环以进行数百次非阻塞通信。
当我使用5个或更少的节点时,代码可以工作。但是在超过5个节点的情况下,它会失败。
int rank, npes;
MPI_Comm_size(MPI_COMM_WORLD,&npes);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
float* recv_buffer_last = (float*)malloc(sizeof(float)*YMAX*FMAX);
float* recv_buffer_next = (float*)malloc(sizeof(float)*YMAX*FMAX);
float* send_buffer_next = (float*)malloc(sizeof(float)*YMAX*FMAX);
float* send_buffer_last = (float*)malloc(sizeof(float)*YMAX*FMAX);
if(npes >1)
{
MPI_Status reqstat;
MPI_Request send_request;
MPI_Request recv_request;
for(int loop = 0; loop < 100; loop++) //if no loop here,code always works.
{
MPI_Irecv(recv_buffer_last,YMAX*FMAX,MPI_FLOAT,(rank == 0)?(npes - 1):(rank - 1),100,MPI_COMM_WORLD,&recv_request);
MPI_Irecv(recv_buffer_next,YMAX*FMAX,MPI_FLOAT,(rank == npes-1)?0:rank+1,1000,MPI_COMM_WORLD,&recv_request);
MPI_Isend(send_buffer_next,YMAX*FMAX,MPI_FLOAT,(rank == npes -1)?0:rank+1,100,MPI_COMM_WORLD,&send_request);
MPI_Isend(send_buffer_last,YMAX*FMAX,MPI_FLOAT,(rank ==0)?(npes - 1):(rank - 1),1000,MPI_COMM_WORLD,&send_request);
MPI_Waitall(1,&recv_request,&reqstat);
MPI_Waitall(1,&send_request,&reqstat);
}
}
else
{
memcpy(recv_buffer_last,send_buffer_next,sizeof(float)*YMAX*FMAX);
memcpy(recv_buffer_next,send_buffer_last,sizeof(float)*YMAX*FMAX);
}
但是,如果我注释掉循环行,只发送和接收一次数据,则无论节点数如何,代码都能正常工作。我真的不知道哪里出了问题。
以下是代码。
int rank, npes;
MPI_Comm_size(MPI_COMM_WORLD,&npes);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
/*
float* recv_buffer_last = (float*)malloc(sizeof(float)*YMAX*FMAX);
float* recv_buffer_next = (float*)malloc(sizeof(float)*YMAX*FMAX);
float* send_buffer_next = (float*)malloc(sizeof(float)*YMAX*FMAX);
float* send_buffer_last = (float*)malloc(sizeof(float)*YMAX*FMAX);
*/
float recv_buffer_last[FMAX*YMAX];
float recv_buffer_next[FMAX*YMAX];
float send_buffer_last[FMAX*YMAX];
float send_buffer_next[FMAX*YMAX];
int prev = (rank+npes-1)%npes;
int next = (rank+1)%npes;
if(npes >1)
{
MPI_Request requests[4];
for(int loop = 0; loop < 100; loop++)
{
MPI_Irecv(recv_buffer_last,YMAX*FMAX,MPI_FLOAT,prev,100,MPI_COMM_WORLD,&requests[0]);
MPI_Irecv(recv_buffer_next,YMAX*FMAX,MPI_FLOAT,next,1000,MPI_COMM_WORLD,&requests[1]);
MPI_Isend(send_buffer_next,YMAX*FMAX,MPI_FLOAT,next,100,MPI_COMM_WORLD,&requests[2]);
MPI_Isend(send_buffer_last,YMAX*FMAX,MPI_FLOAT,prev,1000,MPI_COMM_WORLD,&requests[3]);
MPI_Waitall(4,requests,MPI_STATUSES_IGNORE);
}
}
else
{
memcpy(recv_buffer_last,send_buffer_next,sizeof(float)*YMAX*FMAX);
memcpy(recv_buffer_next,send_buffer_last,sizeof(float)*YMAX*FMAX);
}
更新
感谢大家指出MPI_Waitall的错误,但进行修改并不能解决我的问题。
在玩了我的代码之后,我发现将动态数组recv_buffer和send_buffer更改为静态数组,使代码完美运行。
{{1}}
我想知道是什么造成了不同,以便我可以避免将来出现类似问题。
答案 0 :(得分:1)
我很确定您的问题的根源在于您有悬挂请求,在通信结束时您不会等待,因为您有2 MPI_Irecv()
和2 MPI_Isend()
每个循环,所以发布了4个请求,但只有2个请求等待。这意味着在内部,MPI库将为这些请求分配资源,这些资源永远不会被释放,从而导致达到一些内部限制和您遇到的错误。
以下是您的代码的示例:
int rank, npes;
MPI_Comm_size( MPI_COMM_WORLD, &npes );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
int len = YMAX*FMAX;
float *recv_buffer_last = (float*) malloc( sizeof( float ) * len );
float *recv_buffer_next = (float*) malloc( sizeof( float ) * len );
float *send_buffer_next = (float*) malloc( sizeof( float ) * len );
float *send_buffer_last = (float*) malloc( sizeof( float ) * len );
if ( npes > 1 ) {
MPI_Request requests[4];
int prev = ( rank + npes - 1 ) % npes;
int next = ( rank + 1 ) % npes;
for ( int loop = 0; loop < 100; loop++ ) {
MPI_Irecv( recv_buffer_last, len, MPI_FLOAT, prev, 100, MPI_COMM_WORLD, &requests[0] );
MPI_Irecv( recv_buffer_next, len, MPI_FLOAT, next, 1000, MPI_COMM_WORLD, &requests[1] );
MPI_Isend( send_buffer_next, len, MPI_FLOAT, next, 100, MPI_COMM_WORLD, &requests[2] );
MPI_Isend( send_buffer_last, len, MPI_FLOAT, prev, 1000, MPI_COMM_WORLD, &requests[3] );
MPI_Waitall( 4, requests, MPI_STATUSES_IGNORE );
}
}
else {
memcpy( recv_buffer_last, send_buffer_next, sizeof( float ) * len );
memcpy( recv_buffer_next, send_buffer_last, sizeof( float ) * len);
}