Question

我尝试用MPI并行化代码。在此代码中，要并行化的部分位于函数中。我必须将顺序循环转换为MPI并行循环。

在这个MPI循环之后，我必须得到一个全局数组，我打算使用MPI_Gather来获得这个数组。

以下是代码的结构：

int main() {

    double *array_global;

    data1 = read(file1);
    data2 = read(file2);

    data3 = compute_on_data(data1, data2);

    write(file3,data3);

    function_to_parallelize(data1, data2, data3, array_global);

}

和函数“function_to_parallelize”：

function_to_parallelize(data1, data2, data3, array_global) {

  int i;

  for (i = 0;i<size_loop; i++)
     {
       compute(data1, data2, data3, i, array_global);
     }

   write(file4, array_global);

}

我的第一个问题是：我可以通过主要（添加rank_mpi和nb_process参数）进行MPI并行化：

int main() {

int rank_mpi, nb_process;

MPI_Init(&argc, &argv); 
MPI_Comm_rank(MPI_COMM_WORLD, &rank_mpi); 
MPI_Comm_size(MPI_COMM_WORLD, &nb_process);

    double *array_global;

    data1 = read(file1);
    data2 = read(file2);

    data3 = compute_on_data(data1, data2);

    if (rank_mpi = 0) {
      write(file3,data3);}

    function_to_parallelize(data1, data2, data3, *array_global, rank_mpi, nb_process);

}

并在“function_parallelize”

中执行

function_to_parallelize(data1, data2, data3, array_global, rank_mpi, nb_process) {

  int i;

  double *gathered_array_global;

  int size_block = size_loop/nb_process;

  for (i = rank_mpi*size_block; i < (rank_mpi+1)*size_block; i++)
     {
       compute(data1, data2, data3, i, array_global);
     }

   MPI_Gather(gathered_array_global, array_global, 0); // Gather all array_global into gathered_array_global for root process "rank_mpi = 0"

   if (rank_mpi = 0) {
     write(file4, gathered_array_global);}
}

??我的意思是，如果我在函数中使用MPI_Gather，我是否可以获得所需的结果，即所有array_global都被放入我想用“file4”编写的最终数组中？

我只知道，传统上，MPI_Gather用于main（）以收集所有子数组。如果我进入例程，我认为流程不能与其他流程同步，因此无法在它们之间进行通信，是不是？

我的第二个问题是关于采用此并行化的策略：您是否认为所有流程都可以读取“file1”，“file2”而不会产生冲突？

对于写“data3”，我认为我只能为一个进程（rank_mpi = 0）编写，否则执行时会出错

感谢您的帮助和建议

Answer 1

如果使用正确的语法编写，您的伪代码将会运行。在main以外的函数中使用MPI调用没有任何问题。但我认为它不会做你想要的。在function_to_parallelize中使用MPI调用很好，但让我们仔细看看，

function_to_parallelize(data1, data2, data3, array_global, rank_mpi, nb_process) {
  int i;
  double *gathered_array_global;
  int size_block = size_loop/nb_process;
  for (i = rank_mpi*size_block; i < (rank_mpi+1)*size_block; i++)
     {
       compute(data1, data2, data3, i, array_global);
     }
   MPI_Gather(gathered_array_global, array_global, 0); // Gather all array_global into gathered_array_global for root process "rank_mpi = 0"
   if (rank_mpi = 0) {
     write(file4, gathered_array_global);}
}

您可以解决问题，以便只计算每个进程中每个array_global的一部分。例如，假设您rank_mpi为0-3且size_loop=40为size_block=10。这意味着{1}将在流程0上计算，array_global[0-9]将在流程1上计算，依此类推。然后，当您致电array_global[10-19]时，您需要MPI_Gather大小为gathered_array_global来保存数据。

我认为你想要的是制作更适合计算数据的小型局部数组（换句话说大小为10的数组），然后将它们传递给MPI_Gather以收集到大小为40的数组（gather_array_global）。当然，如果数组的大小不能被您正在使用的进程数完全整除，您还必须小心。

理论上同时从文件读取所有进程是可以的，但是你可以通过要求大量同时读取来轻松打破几乎任何文件系统。我会使用广播，但这取决于你的具体情况。

最后，除非使用并行输出库，否则不能同时向文件写入多个进程。无论你使用什么文件系统，这都有可能破坏事物。最好做你正在思考和收集的东西然后从一个过程写。

MPI策略 - 并行化为函数 - MPI_Gather

1 个答案: