Question

我刚刚对代码进行了并行化但存在一些问题。我使用mpi-allreduce。我有一个在N个子间隔中分组的一维间隔。每个处理器为每个箱执行一定数量的总和。我使用MP_allreduce为每个子区间的每个处理器求和。代码似乎混合了子区间，因此在某些子区间处理器中贡献它们对应于不同子区间的值。这是mpi_Allreduce的一个相当常见的问题，以及如何解决它？感谢

Answer 1

在不查看代码的情况下，很难解释您的错误到底是什么。但是，根据我对您的问题的理解，您将一系列元素划分为区间，然后为每个过程分配一个区间。然后每个过程计算其部分总和。现在总结所有元素，您正在使用MPI_Allreduce()函数。所以这是上述场景的简单实现。

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <math.h>
#include <assert.h>

int *create_rand_nums(int num_elements) {
    int *rand_nums = (int *)malloc(sizeof(int) * num_elements);
    assert(rand_nums != NULL);
    int i;
    for (i = 0; i < num_elements; i++) {
        rand_nums[i] = rand()%100;
    }
    return rand_nums;
}

int main(int argc, char** argv) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s num_elements_per_proc\n",argv[0]);
        exit(1);
    }

    int num_elements_per_proc = atoi(argv[1]);

    MPI_Init(NULL, NULL);

    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    int *rand_nums, *local_nums;  

    if(world_rank == 0)
    { 
        srand(time(NULL)*world_rank);
        rand_nums = create_rand_nums(world_size*num_elements_per_proc);
    }

    local_nums = (int*)malloc(sizeof(int)*num_elements_per_proc);
    MPI_Scatter(rand_nums,num_elements_per_proc,MPI_INT,local_nums,num_elements_per_proc,MPI_INT,0,MPI_COMM_WORLD);  
    int local_sum = 0;
    int i;
    for (i = 0; i < num_elements_per_proc; i++) 
    {
        local_sum += local_nums[i];
    }

    int global_sum;
    MPI_Allreduce(&local_sum, &global_sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
    printf("World Rank: %d \t Global Sum: %d \n",world_rank,global_sum);
    if(world_rank == 0)
        free(rand_nums);
    free(local_nums);

    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();
}

这是程序中发生的事情

根（world_rank 0）生成由程序参数确定的大小的随机数。
根使用MPI_Scatter()将子区间分配给所有进程（包括其自身）。
每个进程计算其本地总和。
全局总和由MPI_Allreduce()调用计算。

MPI_Allreduce混合和处理器

1 个答案: