MPI中持续的沟通 - 奇怪的行为

时间:2015-11-20 13:19:01

标签: mpi nonblocking

我使用 jacobi 迭代并使用非阻塞调用MPI_Isend()来解决并行几何多网格的最粗网格 MPI_Irecv()。这没有问题。一旦我用persistent communications替换非阻塞通信 - 结果就会停止收敛到这个级别,程序进入无限循环。来电MPI_Startall()MPI_Waitall()始终返回MPI_SUCCESS。以前有人遇到过这个问题吗?请指教。

 Coarsest_grid_solve()
{
MPI_Recv_init(&e_c_old[0][1][1], 1, x_subarray_c, X_DOWN, 10, new_comm, &recv[0]);
MPI_Recv_init(&e_c_old[PXC+1][1][1], 1, x_subarray_c, X_UP, 20, new_comm, &recv[1]);
MPI_Recv_init(&e_c_old[1][PYC+1][1], 1, y_subarray_c, Y_RIGHT, 30, new_comm, &recv[2]);
MPI_Recv_init(&e_c_old[1][0][1], 1, y_subarray_c, Y_LEFT, 40, new_comm, &recv[3]); 
MPI_Recv_init(&e_c_old[1][1][PZC+1], 1, z_subarray_c, Z_AWAY_U, 50, new_comm, &recv[4]);
MPI_Recv_init(&e_c_old[1][1][0], 1, z_subarray_c, Z_TOWARDS_U, 60, new_comm, &recv[5]);

MPI_Send_init(&e_c_old[PXC][1][1], 1, x_subarray_c, X_UP, 10, new_comm, &send[0]);
MPI_Send_init(&e_c_old[1][1][1], 1, x_subarray_c, X_DOWN, 20, new_comm, &send[1]);
MPI_Send_init(&e_c_old[1][1][1], 1, y_subarray_c, Y_LEFT, 30, new_comm, &send[2]);
MPI_Send_init(&e_c_old[1][PYC][1], 1, y_subarray_c, Y_RIGHT, 40, new_comm, &send[3]);
MPI_Send_init(&e_c_old[1][1][1], 1, z_subarray_c, Z_TOWARDS_U, 50, new_comm, &send[4]);
MPI_Send_init(&e_c_old[1][1][PZC], 1, z_subarray_c, Z_AWAY_U, 60, new_comm, &send[5]);  

while(rk_global/r0_global > TOL_CNORM)              
{
coarse_iterations++ ;           

err = MPI_Startall(6,recv);
if(err == MPI_SUCCESS)
    printf("success");

err = MPI_Startall(6,send);
if(err == MPI_SUCCESS)
    printf("success");

err = MPI_Waitall(6, send, MPI_STATUSES_IGNORE);
if(err == MPI_SUCCESS)
    printf("success");

err = MPI_Waitall(6, recv, MPI_STATUSES_IGNORE);
if(err == MPI_SUCCESS)
    printf("success");

//do work here

if(coarse_iterations == 1)
{
    update_neumann_c(e_c_old, PXC, PYC, PZC, X_UP, Y_RIGHT, Z_AWAY_U);
    residual_coarsest(e_c_old, rho_c, PXC, PYC, PZC, X_UP, Y_RIGHT, Z_AWAY_U, hc, rho_temp); 
    r0_local = residual_norm(rho_temp, PXC, PYC, PZC); 
    start_allred = MPI_Wtime();
    MPI_Allreduce(&r0_local, &r0_global, 1, MPI_DOUBLE, MPI_SUM, new_comm); 
    end_allred = MPI_Wtime(); 
    r0_global = r0_global/( (PXC*dims0) * (PYC*dims1) * (PZC*dims2) ); 
    if(rank == 0)
        printf("\nGlobal residual norm is = %f", r0_global);
    rk_global = r0_global;
}           
else
{
    update_neumann_c(e_c_old, PXC, PYC, PZC, X_UP, Y_RIGHT, Z_AWAY_U);
    residual_coarsest(e_c_old, rho_c, PXC, PYC, PZC, X_UP, Y_RIGHT, Z_AWAY_U, hc, rho_temp); 
    rk_local = residual_norm(rho_temp, PXC, PYC, PZC);
    start_allred = MPI_Wtime(); 
    MPI_Allreduce(&rk_local, &rk_global, 1, MPI_DOUBLE, MPI_SUM, new_comm);
    end_allred = MPI_Wtime();  
    rk_global = rk_global/( (PXC*dims0) * (PYC*dims1) * (PZC*dims2) );
    if(rank == 0)
        printf("\nGlobal residual norm is = %f", rk_global);
}

//do dependent work and exchange matrices
}//while loop ends

for(i = 0; i <= 5 ; i++)
{
    MPI_Request_free(&send[i]);
    MPI_Request_free(&recv[i]);
}
}//End coarsest grid solve

注意:奇怪的是,重影迭代时重影数据变为零。 (刚刚发现 - 不知道为什么)。

1 个答案:

答案 0 :(得分:2)

当我们为通信创建持久句柄时,我们指向我们想要转移到其他进程的特定内存块。现在在Jacobi迭代中,我们需要在迭代结束时交换指针,使矩阵指向 new 更新的矩阵。因此,指针指向的存储器位置改变。因此,交换原始位置。解决方法是定义两个持久性通信句柄。在奇数迭代中使用第一个句柄,在偶数迭代上使用另一个句柄,即交替它们。这解决了我的问题。这个问题也扩大了我对MPI中持久通信的理解。