Question

我有一个2D数组，我在每个进程上运行一些计算。之后，我需要将所有计算列收集回根进程。我目前以先到先得的方式进行分区。在伪代码中，主循环看起来像：

DO i = mpi_rank + 1, num_columns, mpi_size
   array(:,i) = do work here

完成此操作后，我需要将这些列收集到根进程中的正确索引中。做这个的最好方式是什么？如果分区方案不同，看起来MPI_GATHERV可以做我想要的。但是，我不确定分区的最佳方式是什么，因为num_columns和mpi_size不一定是可以整除的。

Answer 1

我建议采用以下方法：

将2D数组切割为“几乎相等”大小的块，即本地列数接近num_columns / mpi_size。
使用mpi_gatherv收集块，块使用不同大小的块进行操作。

要获得“几乎相等”的列数，请将本地列数设置为num_columns / mpi_size的整数值，并仅对第一个mod(num_columns,mpi_size) mpi任务增加1。

下表演示了（10,12）矩阵在5个MPI过程中的划分：

  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42

此处第一个数字是进程的 id ，第二个数字是多个本地列。如您所见，进程0和1各有3列，而所有其他进程每个只有2列。

下面你可以找到我写的工作示例代码。最棘手的部分是为MPI_Gatherv生成rcounts和displs数组。讨论的表是代码的输出。

  program mpi2d
  implicit none
  include 'mpif.h'
  integer myid, nprocs, ierr
  integer,parameter:: m = 10       ! global number of rows
  integer,parameter:: n = 12       ! global number of columns
  integer nloc                     ! local  number of columns
  integer array(m,n)               ! global m-by-n, i.e. m rows and n columns
  integer,allocatable:: loc(:,:)   ! local piece of global 2d array
  integer,allocatable:: rcounts(:) ! array of nloc's (for mpi_gatrherv)
  integer,allocatable:: displs(:)  ! array of displacements (for mpi_gatherv)
  integer i,j


  ! Initialize
  call mpi_init(ierr)
  call mpi_comm_rank(MPI_COMM_WORLD, myid, ierr)
  call mpi_comm_size(MPI_COMM_WORLD, nprocs, ierr)

  ! Partition, i.e. get local number of columns
  nloc = n / nprocs
  if (mod(n,nprocs)>myid) nloc = nloc + 1

  ! Compute partitioned array
  allocate(loc(m,nloc))
  do j=1,nloc
    loc(:,j) = myid*10 + j
  enddo

  ! Build arrays for mpi_gatherv:
  ! rcounts containes all nloc's
  ! displs  containes displacements of partitions in terms of columns
  allocate(rcounts(nprocs),displs(nprocs))
  displs(1) = 0
  do j=1,nprocs
    rcounts(j) = n / nprocs
    if(mod(n,nprocs).gt.(j-1)) rcounts(j)=rcounts(j)+1
    if((j-1).ne.0)displs(j) = displs(j-1) + rcounts(j-1)
  enddo

  ! Convert from number of columns to number of integers
  nloc    = m * nloc
  rcounts = m * rcounts
  displs  = m * displs

  ! Gather array on root
  call mpi_gatherv(loc,nloc,MPI_INT,array,
 &  rcounts,displs,MPI_INT,0,MPI_COMM_WORLD,ierr)

  ! Print array on root
  if(myid==0)then
    do i=1,m
      do j=1,n
        write(*,'(I04.2)',advance='no') array(i,j)
      enddo
      write(*,*)
    enddo
  endif

  ! Finish
  call mpi_finalize(ierr)

  end

Answer 2

如何收集大小mpi_size的大小？

为了缩短此处，我假设num_columns是mpi_size的倍数。在您的情况下，收集应该类似于（lda是array的第一个维度）：

DO i = 1, num_columns/mpi_size
  IF (rank == 0) THEN
    CALL MPI_GATHER(MPI_IN_PLACE, lda, [TYPE], array(1,(i-1)*mpi_size+1), lda, [TYPE], 0, MPI_COMM_WORLD, ierr)
  ELSE
    CALL MPI_GATHER(array(1, rank + (i-1)*mpi_size + 1), lda, [TYPE], array(1,(i-1)*mpi_size+1), lda, [TYPE], 0, MPI_COMM_WORLD, ierr)
  END IF
ENDDO

我对指数不太确定，如果这确实有效，但我认为，你应该明白这一点。

MPI在Fortran中分区并收集2D数组

2 个答案: