Question

我正在通过使用Open MPI消息通信来实现屏障同步。我创建了一个名为容器的结构数组。每个容器都链接到右边的邻居，两端的两个元素也链接在一起，形成一个圆圈。

在main（）测试客户端中，我运行带有多个进程的MPI（mpiexec -n 5 ./a.out），它们应该通过调用barrier（）函数进行同步，但是，我的代码被卡住了在最后一个过程中。我正在寻找调试方面的帮助。请参阅下面的代码：

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <mpi.h>

typedef struct container {
    int labels;                  
    struct container *linked_to_container;    
    int sense;
} container;

container *allcontainers;   /* an array for all containers */
int size_containers_array;

int get_next_container_id(int current_container_index, int max_index)
{
    if (max_index - current_container_index >= 1)
    {
        return current_container_index + 1;
    }
    else 
        return 0;        /* elements at two ends are linked */
}

container *get_container(int index)
{
    return &allcontainers[index];
}


void container_init(int num_containers)
{
    allcontainers = (container *) malloc(num_containers * sizeof(container));  /* is this right to malloc memory on the array of container when the struct size is still unknown?*/
    size_containers_array = num_containers;

    int i;
    for (i = 0; i < num_containers; i++)
    {
        container *current_container = get_container(i);
        current_container->labels = 0;
        int next_container_id = get_next_container_id(i, num_containers - 1);     /* max index in all_containers[] is num_containers-1 */
        current_container->linked_to_container = get_container(next_container_id);
        current_container->sense = 0;   
    }
}

void container_barrier()
{
    int current_container_id, my_sense = 1;
    int tag = current_container_id;
    MPI_Request request[size_containers_array];
    MPI_Status status[size_containers_array];

    MPI_Comm_rank(MPI_COMM_WORLD, &current_container_id);
    container *current_container = get_container(current_container_id);

    int next_container_id = get_next_container_id(current_container_id, size_containers_array - 1);

    /* send asynchronous message to the next container, wait, then do blocking receive */
    MPI_Isend(&my_sense, 1, MPI_INT, next_container_id, tag, MPI_COMM_WORLD, &request[current_container_id]);
    MPI_Wait(&request[current_container_id], &status[current_container_id]);
    MPI_Recv(&my_sense, 1, MPI_INT, next_container_id, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

}

void free_containers()
{
    free(allcontainers);
}

int main(int argc, char **argv)
{
    int my_id, num_processes;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &num_processes);
    MPI_Comm_rank(MPI_COMM_WORLD, &my_id);

    container_init(num_processes);

    printf("Hello world from thread %d of %d \n", my_id, num_processes);
    container_barrier();
    printf("passed barrier \n");



    MPI_Finalize();
    free_containers();

    return 0;
}

Answer 1

问题是一系列电话：

MPI_Isend()
MPI_Wait()
MPI_Recv()

这是混淆的常见原因。当您在MPI中使用“非阻塞”调用时，您实际上是告诉MPI库您要对某些数据（my_sense）执行某些操作（发送）。 MPI会返回一个MPI_Request对象，保证在完成函数完成MPI_Request之后调用将完成。

您遇到的问题是，您在呼叫MPI_Isend并立即呼叫MPI_Wait之前，曾在任何级别呼叫MPI_Recv。这意味着所有这些发送调用都排队，但实际上从来没有去过任何地方，因为你从未通过调用MPI_Recv告诉MPI将数据放在何处（告诉MPI你要将数据放入{ {1}}）。

这部分时间起作用的原因是MPI预计事情可能并不总是完美同步。如果您使用较小的消息（您这样做），MPI会保留一些缓冲区空间并让您的my_sense操作完成，并将数据隐藏在该临时空间中一段时间，直到您稍后致电MPI_Send告诉MPI在哪里移动数据。最终，这将不再适用。缓冲区已满，您需要实际开始接收消息。对您而言，这意味着您需要切换操作的顺序。您应该首先执行非阻塞接收，然后执行阻止发送，然后等待接收完成，而不是执行非阻塞发送：

MPI_Recv

另一种选择是将两个函数都转换为非阻塞函数，而使用MPI_Irecv() MPI_Send() MPI_Wait()代替：

MPI_Waitall

这最后一个选项通常是最好的。您唯一需要注意的是您不会覆盖自己的数据。现在，您正在为发送和接收操作使用相同的缓冲区。如果这两种情况同时发生，则无法保证订购。通常这没有区别。无论您是先发送消息还是接收消息都无关紧要。但是，在这种情况下确实如此。如果您首先接收数据，则最终会再次发送相同的数据，而不是发送接收操作之前的数据。您可以通过使用临时缓冲区来暂存数据并在安全时将其移动到正确的位置来解决此问题。

禁止呼叫陷入Open MPI（C程序）

1 个答案: