具有非阻塞MPI通信的死锁

时间:2014-03-03 15:22:42

标签: mpi openmpi

以下代码是将鬼点传达给上/下和左/右邻居的例程。该例程在迭代方法的循环期间被调用,大约数百次。

问题在于,虽然它是用非阻塞通信编写的,但它是死锁...有趣的是它似乎可以进行多次迭代,并且突然间它会冻结。我试图评论第二个通信循环(顶部/底部)它也冻结了但是在一个更大的迭代索引....所有这些都发生好像是允许的最大通信数量或其他东西。

我认为Isend和Irecv不应该发生僵局。我认为,在调用MPI_Wait_all()之前,我还没有触及缓冲区。

我是否以糟糕的方式使用MPI例程?

void fields_updateEghosts(Fields *this, Params *parameters)
{


double *ex_in[2], *ex_out[2], *ey_in[2], *ey_out[2];
int neighbors_in[2], neighbors_out[2], ineighb;
MPI_Request requests_lr[4], requests_tb[4];
int ix,iy,nx,ny,i;
YeeCell *yee;

yee = this->yeecell;
nx  = this->nx;
ny  = this->ny;

/* 0,1 = top/bottom for Ey and left/right for Ex*/
for (i=0; i < 2; i++)
{
    MEM_ALLOC(ex_in[i],  ny*sizeof *ex_in[0]);
    MEM_ALLOC(ex_out[i], ny*sizeof *ex_out[0]);
    MEM_ALLOC(ey_in[i],  nx*sizeof *ey_in[0]);
    MEM_ALLOC(ey_out[i], nx*sizeof *ey_out[0]);
}

/* we send the points just inside the boundary */
for (iy=1; iy < ny; iy++)
{
    ex_out[PART_LEFT][iy]  = ex(1   ,iy);
    ex_out[PART_RIGHT][iy] = ex(nx-2,iy);
}

neighbors_in[0]  = PART_LEFT;
neighbors_in[1]  = PART_RIGHT;
neighbors_out[0] = PART_RIGHT;
neighbors_out[1] = PART_LEFT;

for (ineighb=0; ineighb < 2; ineighb++)
{
    MPI_Irecv(ex_in[neighbors_in[ineighb]],
              ny, MPI_DOUBLE,
              parameters->para->neighbors[neighbors_in[ineighb]], /*src rank */
              neighbors_out[ineighb],                           /* tag */
              MPI_COMM_WORLD,
              &requests_lr[ineighb]);

    MPI_Isend(ex_out[neighbors_out[ineighb]],
              ny, MPI_DOUBLE,
              parameters->para->neighbors[neighbors_out[ineighb]],
              neighbors_out[ineighb],
              MPI_COMM_WORLD,
              &requests_lr[ineighb+2]);
}

/* fill the outgoing top and bottom buffers
   while left/right communications are done*/
for (ix=1; ix < nx; ix++)
{
    ey_out[PART_TOP][ix] = ey(ix,ny-2);
    ey_out[PART_BOT][ix] = ey(ix,1);
}


/* now communications for top/bottom */
neighbors_in[0]  = PART_TOP;
neighbors_in[1]  = PART_BOT;
neighbors_out[0] = PART_BOT;
neighbors_out[1] = PART_TOP;

for (ineighb=0; ineighb < 2; ineighb++)
{
    MPI_Irecv(ey_in[neighbors_in[ineighb]],
              nx, MPI_DOUBLE,
              parameters->para->neighbors[neighbors_in[ineighb]],
              neighbors_out[ineighb],
              MPI_COMM_WORLD,
              &requests_tb[ineighb]);

    MPI_Isend(ey_out[neighbors_out[ineighb]],
              nx, MPI_DOUBLE,
              parameters->para->neighbors[neighbors_out[ineighb]],
              neighbors_out[ineighb],
              MPI_COMM_WORLD,
              &requests_tb[ineighb+2]);
}

/* now wait for communications to be done
   before copying the data into the arrays */

MPI_Waitall(4, requests_lr, MPI_STATUS_IGNORE);
MPI_Waitall(4, requests_tb, MPI_STATUS_IGNORE);


for (iy=1; iy < ny; iy++)
{
    ex(0   ,iy) = ex_in[PART_LEFT][iy];
    ex(nx-1,iy) = ex_in[PART_RIGHT][iy];
}

for (ix=1; ix < nx; ix++)
{
    ey(ix,ny-1) = ey_in[PART_TOP][ix];
    ey(ix,0)    = ey_in[PART_BOT][ix];
}



for (i=0; i < 2; i++)
{
    MEM_FREE(ex_in[i]);
    MEM_FREE(ex_out[i]);
    MEM_FREE(ey_in[i]);
    MEM_FREE(ey_out[i]);
}
}

1 个答案:

答案 0 :(得分:0)

我找到了答案。我现在将解释它以防有人遇到同样的问题。 首先,上面的功能很好,我觉得没问题。 问题出在迭代方法中,该方法调用此例程来获取ghost节点值。 迭代方法有一个收敛标准,我忘了为全局域计算。因此,某些过程会在其他过程之前满足收敛性测试,并退出循环...让其他过程等待他们的好友...... 一个小的mpi_allreduce()计算收敛,不再阻塞。