Question

我有一个使用阻塞接收和非阻塞发送的有限元代码。每个元素有3个传入面和3个传出面。网格在许多处理器之间分开，因此有时边界条件来自元素处理器或来自相邻处理器。代码的相关部分是：

std::vector<task>::iterator it = All_Tasks.begin();
std::vector<task>::iterator it_end = All_Tasks.end();
int task = 0;
for (; it != it_end; it++, task++)
{
  for (int f = 0; f < 3; f++)
    {
        // Get the neighbors for each incoming face
        Neighbor neighbor = subdomain.CellSets[(*it).cellset_id_loc].neighbors[incoming[f]];

        // Get buffers from boundary conditions or neighbor processors
        if (neighbor.processor == rank)
        {
            subdomain.Set_buffer_from_bc(incoming[f]);
        }
        else
        {
            // Get the flag from the corresponding send
            target = GetTarget((*it).angle_id, (*it).group_id, (*it).cell_id);
            if (incoming[f] == x)
            {
                int size = cells_y*cells_z*groups*angles*4;
                MPI_Status status;
                MPI_Recv(&subdomain.X_buffer[0], size, MPI_DOUBLE, neighbor.processor, target, MPI_COMM_WORLD, &status);
            }
            if (incoming[f] == y)
            {
                int size = cells_x*cells_z*groups*angles * 4;
                MPI_Status status;
                MPI_Recv(&subdomain.Y_buffer[0], size, MPI_DOUBLE, neighbor.processor, target, MPI_COMM_WORLD, &status);
            }
            if (incoming[f] == z)
            {
                int size = cells_x*cells_y*groups*angles * 4;
                MPI_Status status;
                MPI_Recv(&subdomain.Z_buffer[0], size, MPI_DOUBLE, neighbor.processor, target, MPI_COMM_WORLD, &status);
            }
        }
    }

    ... computation ...

    for (int f = 0; f < 3; f++)
    {
        // Get the outgoing neighbors for each face
        Neighbor neighbor = subdomain.CellSets[(*it).cellset_id_loc].neighbors[outgoing[f]];

        if (neighbor.IsOnBoundary)
        {
            // store the buffer into the boundary information
        }
        else
        {
            target = GetTarget((*it).angle_id, (*it).group_id, neighbor.cell_id);
            if (outgoing[f] == x)
            {
                int size = cells_y*cells_z*groups*angles * 4;
                MPI_Request request;
                MPI_Isend(&subdomain.X_buffer[0], size, MPI_DOUBLE, neighbor.processor, target, MPI_COMM_WORLD, &request);
            }
            if (outgoing[f] == y)
            {
                int size = cells_x*cells_z*groups*angles * 4;
                MPI_Request request;
                MPI_Isend(&subdomain.Y_buffer[0], size, MPI_DOUBLE, neighbor.processor, target, MPI_COMM_WORLD, &request);
            }
            if (outgoing[f] == z)
            {
                int size = cells_x*cells_y*groups*angles * 4;
                MPI_Request request;
                MPI_Isend(&subdomain.Z_buffer[0], size, MPI_DOUBLE, neighbor.processor, target, MPI_COMM_WORLD, &request);

            }
        }

    }
}

处理器可以在需要来自其他处理器的信息之前完成许多任务。我需要一个非阻塞发送，以便代码可以继续工作，但我很确定接收在发送之前覆盖发送缓冲区。

我已尝试对此代码进行计时，即使已尝试接收的消息已发送，也需要5-6秒才能调用MPI_Recv。我的理论是Isend正在启动，但实际上并没有发送任何东西，直到调用Recv。消息本身大约为1 MB。我看过基准测试，这个大小的消息应该花费很短的时间才能发送。

我的问题是，在这段代码中，是被覆盖的缓冲区，还是本地拷贝？有没有办法在我发送时“添加”到缓冲区，而不是写入相同的内存位置？我希望Isend每次调用时都写入不同的缓冲区，这样在消息等待接收时不会覆盖信息。

**编辑** 一个可能解决我的问题的相关问题：MPI_Test或MPI_Wait是否可以提供有关MPI_Isend写入缓冲区的信息，即如果Isend已写入缓冲区则返回true，但尚未接收到该缓冲区？

**编辑2 ** 我已经添加了有关我的问题的更多信息。

Answer 1

所以看起来我只需咬紧牙关并在发送缓冲区中分配足够的内存以容纳所有消息，然后在发送时发送部分缓冲区。

MPI_Isend重用内部缓冲区

1 个答案: