Question

我想使用MPI_GATHER发送2d数据块。例如：我在每个节点上有2x3阵列，如果我有4个节点，我想在root上使用8x3阵列。对于1d数组，MPI_GATHER根据MPI排名对数据进行排序，但对于2d数据，它会造成混乱！

按顺序放置块的干净方法是什么？

我期待这段代码的输出：

program testmpi
  use mpi
  implicit none
  integer :: send (2,3)
  integer :: rec (4,3)
  integer :: ierror,my_rank,i,j

  call MPI_Init(ierror)
  MPI_DATA_TYPE type_col
  ! find out process rank
  call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierror)
  if (my_rank==0) then
    send=1
    do i=1,2
      print*,(send(i,j),j=1,3)
    enddo
  endif
  if (my_rank==1) then
    send=5
    ! do 1,2
    !   print*,(send(i,j),j=1,3)
    ! enddo
  endif
  call MPI_GATHER(send,6,MPI_INTEGER,rec,6,MPI_INTEGER,0,MPI_COMM_WORLD,ierror)
  if (my_rank==0) then
    print*,'<><><><><>rec'
    do i=1,4
      print*,(rec(i,j),j=1,3)
    enddo
  endif
  call MPI_Finalize(ierror)
end program testmpi

是这样的：

   1           1           1
   1           1           1
   5           5           5
   5           5           5

但它看起来像这样：

   1           1           5
   1           1           5
   1           5           5
   1           5           5

Answer 1

以下是this answer的字面Fortran翻译。我原以为这是不必要的，但数组索引和内存布局的多重差异可能意味着值得做一个Fortran版本。

首先让我说你通常不想这样做 - 分散并收集一些＆＃34; master＆＃34;的大量数据。处理。通常情况下，您希望每个任务都能在其自己的拼图中挣扎，并且您的目标是永远不要让一个处理器需要一个全局视图＆＃34;整个数据;只要您需要，就可以限制可扩展性和问题大小。如果您正在为I / O执行此操作 - 一个进程读取数据，然后将其分散，然后将其收集回来进行编写，您最终希望查看MPI-IO。

提出问题，但是，MPI有很好的方法可以将任意数据从内存中拉出来，并将其分散/收集到一组处理器中。不幸的是，这需要相当数量的MPI概念--MPI类型，范围和集体操作。在这个问题的答案中讨论了许多基本思想 - MPI_Type_create_subarray and MPI_Gather。

考虑一个1d整数全局数组，任务0具有你要分发给许多MPI任务的数组，这样它们每个都可以在本地数组中获得一个部分。假设你有4个任务，全局数组是[0,1,2,3,4,5,6,7]。您可以让任务0发送四条消息（包括一条消息）以分发它，当它需要重新组装时，接收四条消息将它们捆绑在一起;但这显然在大量流程中非常耗时。有针对这些操作的优化例程 - 分散/收集操作。所以在这个1d案例中，你可以这样做：

integer, dimension(8) :: global      ! only root has this
integer, dimension(2) :: local       ! everyone has this
integer, parameter    :: root = 0
integer :: rank, comsize
integer :: i, ierr

call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

if (rank == root) then
    global = [ (i, i=1,8) ]
endif

call MPI_Scatter(global, 2, MPI_INTEGER, &    ! send everyone 2 ints from global
                 local,  2, MPI_INTEGER, &    ! each proc recieves 2 into
                 root,                   &    ! sending process is root,
                 MPI_COMM_WORLD, ierr)        ! all procs in COMM_WORLD participate

在此之后，处理器＆＃39;数据看起来像

task 0:  local:[1,2]  global: [1,2,3,4,5,6,7,8]
task 1:  local:[3,4]  global: [garbage]
task 2:  local:[5,6]  global: [garbage]
task 3:  local:[7,8]  global: [garbage]

也就是说，分散操作采用全局数组并将连续的2-int块发送给所有处理器。

要重新组装数组，我们使用MPI_Gather（）操作，它的工作方式完全相同但反之亦然：

local = local + rank

call MPI_Gather (local,  2, MPI_INTEGER, &    ! everyone sends 2 ints from local
                 global, 2, MPI_INTEGER, &    ! root receives 2 ints each proc into global
                 root,                   &    ! receiving process is root,
                 MPI_COMM_WORLD, ierr)        ! all procs in COMM_WORLD participate

现在阵列看起来像：

task 0:  local:[1,2]    global: [1,2,4,5,7,8,10,11]
task 1:  local:[4,5]    global: [garbage-]
task 2:  local:[7,8]    global: [garbage-]
task 3:  local:[10,11]  global: [garbage-]

Gather带回所有数据。

如果数据点的数量不能平均分配进程数，会发生什么情况，我们需要向每个进程发送不同数量的项？然后，您需要一个广义的scatter版本MPI_Scatterv，它允许您指定每个处理器的计数，以及位移 - 在该数据片段的全局数组中的位置。所以，让我们用相同的4个任务来说，你有一个包含9个字符的字符数组[a，b，c，d，e，f，g，h，i]，你将分配每个进程2除了最后一个字符，有三个字符。然后你需要

character, dimension(9) :: global
character, dimension(3) :: local
integer, dimension(4)   :: counts
integer, dimension(4)   :: displs

if (rank == root) then
    global = [ (achar(i+ichar('a')), i=0,8) ]
endif
local = ['-','-','-']

counts = [2,2,2,3]
displs = [0,2,4,6]

mycounts = counts(rank+1)

call MPI_Scatterv(global, counts, displs,         & ! proc i gets counts(i) chars from displs(i)
                  MPI_CHARACTER,                  &
                  local, mycounts, MPI_CHARACTER, & ! I get mycounts chars into
                  root,                           & ! root rank does sending
                  MPI_COMM_WORLD, ierr)             ! all procs in COMM_WORLD participate

现在数据看起来像

task 0:  local:"ab-"  global: "abcdefghi"
task 1:  local:"cd-"  global: *garbage*
task 2:  local:"ef-"  global: *garbage*
task 3:  local:"ghi"  global: *garbage*

您现在使用scatterv来分发不规则数据量。在每种情况下的位移是两个*等级（以字符测量;位移是以散布的形式发送的类型或者为聚集接收的类型的单位;它通常不是以字节或其他形式）从开始时的数组，计数是[2,2,2,3]。如果它是我们想要的3个字符的第一个处理器，我们将设置计数= [3,2,2,2]并且位移将是[0,3,5,7]。 Gatherv再次完全相同但反过来; count和displs数组将保持不变。

现在，对于2D，这有点棘手。如果我们想要发送2d阵列的2d子锁，我们现在发送的数据不再是连续的。如果我们向4个处理器发送（比方说）6x6阵列的3x3子块，我们发送的数据就会出现漏洞：

2D Array

   ---------
   |000|222|
   |000|222|
   |000|222|
   |---+---|
   |111|333|
   |111|333|
   |111|333|
   ---------

Actual layout in memory

   [000111000111000111222333222333222333]

（请注意，所有高性能计算都归结为了解内存中数据的布局。）

如果我们要发送标记为＆＃34; 1＆＃34;的数据到任务1，我们需要跳过三个值，发送三个值，跳过三个值，发送三个值，跳过三个值，发送三个值。第二个复杂因素是次区域停止和开始;请注意该地区＆＃34; 1＆＃34;不会从哪个区域开始＃0; 0＆＃34;停止;在区域＆＃34; 0＆＃34;的最后一个元素之后，内存中的下一个位置是中途通过区域＆＃34; 1＆＃34;。

让我们首先解决第一个布局问题 - 如何只提取我们想要发送的数据。我们总是可以复制所有的＆＃34; 0＆＃34;区域数据到另一个，连续的数组，并发送;如果我们仔细计划出来，我们甚至可以这样做，以便我们可以在结果上调用MPI_Scatter。但我们不必以这种方式转换整个主数据结构。

到目前为止，我们使用的所有MPI数据类型都是简单的 - MPI_INTEGER指定（比方说）连续4个字节。但是，MPI允许您创建自己的数据类型，以描述内存中任意复杂的数据布局。这种情况 - 数组的矩形子区域 - 足够常见there's a specific call for that。对于我们上面描述的二维情况，

integer :: newtype;
integer, dimension(2) :: sizes, subsizes, starts

sizes    = [6,6]     ! size of global array
subsizes = [3,3]     ! size of sub-region 
starts   = [0,0]     ! let's say we're looking at region "0"
                     ! which begins at offset [0,0] 

call MPI_Type_create_subarray(2, sizes, subsizes, starts, MPI_ORDER_FORTRAN, MPI_INTEGER, newtype, ierr)
call MPI_Type_commit(newtype, ierr)

这会创建一个只挑选区域的类型＆＃34; 0＆＃34;来自全局数组。请注意，即使在Fortran中，start参数也是从数组的开头给出的 offset （例如，从0开始），而不是索引（例如，从1开始）。

我们现在可以将这段数据发送到另一个处理器

call MPI_Send(global, 1, newtype, dest, tag, MPI_COMM_WORLD, ierr)  ! send region "0"

并且接收进程可以将其接收到本地阵列中。请注意，如果接收过程仅将其接收到3x3阵列中，则无法描述它作为一种新类型接收的内容;不再描述内存布局，因为在一行的结尾和下一行的开头之间没有大的跳跃。相反，它只是接收一个3 * 3 = 9个整数的块：

call MPI_Recv(local, 3*3, MPI_INTEGER, 0, tag, MPI_COMM_WORLD, ierr)

请注意，我们也可以为其他子区域执行此操作，方法是为其他块创建不同的类型（具有不同的启动数组），或者仅从特定块的第一个位置开始发送：

if (rank == root) then
    call MPI_Send(global(4,1), 1, newtype, 1, tag, MPI_COMM_WORLD, ierr)
    call MPI_Send(global(1,4), 1, newtype, 2, tag, MPI_COMM_WORLD, ierr)
    call MPI_Send(global(4,4), 1, newtype, 3, tag, MPI_COMM_WORLD, ierr)
    local = global(1:3, 1:3)
else
    call MPI_Recv(local, 3*3, MPI_INTEGER, 0, tag, MPI_COMM_WORLD, rstatus, ierr)
endif

既然我们已经了解了如何指定子区域，那么在使用分散/聚集操作之前，还有一件事需要讨论，那就是＆＃34; size＆＃34;这些类型。我们不能只使用这些类型的MPI_Scatter（）（甚至是scatterv），因为这些类型的范围是15个整数;也就是说，它们结束后它们的结尾是15个整数 - 它们结束的地方与下一个块开始的位置没有很好的排列，所以我们不能只使用分散 - 它会选择错误的地方开始向下一个处理器发送数据。

当然，我们可以使用MPI_Scatterv（）并自己指定位移，这就是我们要做的事情 - 除了位移以发送类型大小为单位，并且没有＆＃39 ; t帮助我们;这些块从全局数组开始处的（0,3,18,21）个整数的偏移开始，并且一个块从它开始的位置结束15个整数这一事实并不能让我们用整数倍表示这些位移。一点都不。

为了解决这个问题，MPI允许您为这些计算设置类型的范围。它不会截断类型;它只是用于确定下一个元素在最后一个元素的位置开始的位置。对于类似这些类型的类型，它们通常很方便将范围设置为小于内存中距离类型实际末端的距离。

我们可以将范围设定为对我们来说方便的任何事物。我们可以将范围设为1整数，然后以整数为单位设置位移。但是，在这种情况下，我喜欢将范围设置为3个整数 - 子列的大小 - 这样，阻止＆＃34; 1＆＃34;在块＆＃34; 0＆＃34;之后立即开始，并阻止＆＃34; 3＆＃34;在块＆＃34; 2＆＃34;之后立即开始。不幸的是，当从街区跳下来时它并没有那么好的工作＃2;＆＃34;阻止＆＃34; 3＆＃34;，但这无济于事。

因此，为了在这种情况下分散子块，我们执行以下操作：

integer(kind=MPI_ADDRESS_KIND) :: extent

starts   = [0,0]
sizes    = [6, 6]
subsizes = [3, 3]

call MPI_Type_create_subarray(2, sizes, subsizes, starts,        &
                              MPI_ORDER_FORTRAN, MPI_INTEGER,  &
                              newtype, ierr)
call MPI_Type_size(MPI_INTEGER, intsize, ierr)
extent = 3*intsize
call MPI_Type_create_resized(newtype, 0, extent, resizedtype, ierr)
call MPI_Type_commit(resizedtype, ierr)

我们在这里创建了与以前相同的块类型，但我们已经调整了它的大小;我们没有改变类型＆＃34;开始＆＃34; （0）但我们已经改变了＃34;结束的时间＃34; （3个整数）。我们之前没有提到这一点，但MPI_Type_commit需要能够使用该类型;但是你只需要提交实际使用的最终类型，而不是任何中间步骤。您完成后使用MPI_Type_free释放已提交的类型。

所以现在，最后，我们可以分散块：上面的数据操作有点复杂，但一旦完成，scatterv就像以前一样：

counts = 1          ! we will send one of these new types to everyone
displs = [0,1,6,7]  ! the starting point of everyone's data
                    ! in the global array, in block extents

call MPI_Scatterv(global, counts, displs, & ! proc i gets counts(i) types from displs(i) 
        resizedtype,                      &
        local, 3*3, MPI_INTEGER,          & ! I'm receiving 3*3 int
        root, MPI_COMM_WORLD, ierr)         !... from (root, MPI_COMM_WORLD)

现在我们已经完成了一些分散，聚集和MPI衍生类型之后。

下面是一个示例代码，其中显示了带有字符数组的聚集和分散操作。运行程序：

$ mpirun -np 4 ./scatter2d
 global array is:
 000222
 000222
 000222
 111333
 111333
 111333
 Rank            0  received:
 000
 000
 000
 Rank            1  received:
 111
 111
 111
 Rank            2  received:
 222
 222
 222
 Rank            3  received:
 333
 333
 333
 Rank            0  sending:
 111
 111
 111
 Rank            1  sending:
 222
 222
 222
 Rank            2  sending:
 333
 333
 333
 Rank            3  sending:
 444
 444
 444
  Root received:
 111333
 111333
 111333
 222444
 222444
 222444

，代码如下：

program scatter
    use mpi
    implicit none

    integer, parameter :: gridsize = 6    ! size of array
    integer, parameter :: procgridsize = 2 ! size of process grid
    character, allocatable, dimension (:,:) :: global, local
    integer, dimension(procgridsize**2)   :: counts, displs
    integer, parameter    :: root = 0
    integer :: rank, comsize
    integer :: localsize
    integer :: i, j, row, col, ierr, p, charsize
    integer, dimension(2) :: sizes, subsizes, starts

    integer :: newtype, resizedtype
    integer, parameter :: tag = 1
    integer, dimension(MPI_STATUS_SIZE) :: rstatus
    integer(kind=MPI_ADDRESS_KIND) :: extent, begin

    call MPI_Init(ierr)
    call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierr)
    call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

    if (comsize /= procgridsize**2) then
        if (rank == root) then
            print *, 'Only works with np = ', procgridsize**2, ' for now.'
        endif
        call MPI_Finalize(ierr)
        stop
    endif

    localsize = gridsize/procgridsize
    allocate( local(localsize, localsize) )
    if (rank == root) then
        allocate( global(gridsize, gridsize) )
        forall( col=1:procgridsize, row=1:procgridsize )
            global((row-1)*localsize+1:row*localsize, &
                   (col-1)*localsize+1:col*localsize) = &
                    achar(ichar('0')+(row-1)+(col-1)*procgridsize)
        end forall

        print *, 'global array is: '
        do i=1,gridsize
            print *, global(i,:)
        enddo
    endif
    starts   = [0,0]
    sizes    = [gridsize, gridsize]
    subsizes = [localsize, localsize]

    call MPI_Type_create_subarray(2, sizes, subsizes, starts,        &
                                  MPI_ORDER_FORTRAN, MPI_CHARACTER,  &
                                  newtype, ierr)
    call MPI_Type_size(MPI_CHARACTER, charsize, ierr)
    extent = localsize*charsize
    begin  = 0
    call MPI_Type_create_resized(newtype, begin, extent, resizedtype, ierr)
    call MPI_Type_commit(resizedtype, ierr)

    counts = 1          ! we will send one of these new types to everyone
    forall( col=1:procgridsize, row=1:procgridsize )
       displs(1+(row-1)+procgridsize*(col-1)) = (row-1) + localsize*procgridsize*(col-1)
    endforall

    call MPI_Scatterv(global, counts, displs,   & ! proc i gets counts(i) types from displs(i)
            resizedtype,                        &
            local, localsize**2, MPI_CHARACTER, & ! I'm receiving localsize**2 chars
            root, MPI_COMM_WORLD, ierr)           !... from (root, MPI_COMM_WORLD)

    do p=1, comsize
        if (rank == p-1) then
            print *, 'Rank ', rank, ' received: '
            do i=1, localsize
                print *, local(i,:)
            enddo
        endif
        call MPI_Barrier(MPI_COMM_WORLD, ierr)
    enddo

    local = achar( ichar(local) + 1 )

    do p=1, comsize
        if (rank == p-1) then
            print *, 'Rank ', rank, ' sending: '
            do i=1, localsize
                print *, local(i,:)
            enddo
        endif
        call MPI_Barrier(MPI_COMM_WORLD, ierr)
    enddo

    call MPI_Gatherv( local, localsize**2, MPI_CHARACTER, & ! I'm sending localsize**2 chars
                      global, counts, displs, resizedtype,&
                      root, MPI_COMM_WORLD, ierr)

    if (rank == root) then
        print *, ' Root received: '
        do i=1,gridsize
            print *, global(i,:)
        enddo
    endif

    call MPI_Type_free(newtype,ierr)
    if (rank == root) deallocate(global)
    deallocate(local)
    call MPI_Finalize(ierr)

end program scatter

这是一般的解决方案。对于你的特殊情况，我们只是按行追加，我们不需要Gatherv，我们可以只使用一个聚集，因为在这种情况下，所有的位移是相同的 - 之前，在2d块如果我们有一个排量下降＆＃39;然后跳过那个位移，当你走过＆＃39;跨越＆＃39;到下一列的块。在这里，位移始终与前一个位移相同，因此我们不需要明确地给出位移。所以最终的代码如下：

program testmpi
use mpi
    implicit none
    integer, dimension(:,:), allocatable :: send, recv
    integer, parameter :: nsendrows = 2, nsendcols = 3
    integer, parameter :: root = 0
    integer :: ierror, my_rank, comsize, i, j, ierr
    integer :: blocktype, resizedtype
    integer, dimension(2) :: starts, sizes, subsizes
    integer (kind=MPI_Address_kind) :: start, extent
    integer :: intsize

    call MPI_Init(ierror)
    call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierror)
    call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierror)

    allocate( send(nsendrows, nsendcols) )

    send = my_rank

    if (my_rank==root) then
        ! we're going to append the local arrays
        ! as groups of send rows
        allocate( recv(nsendrows*comsize, nsendcols) )
    endif

    ! describe what these subblocks look like inside the full concatenated array
    sizes    = [ nsendrows*comsize, nsendcols ]
    subsizes = [ nsendrows, nsendcols ]
    starts   = [ 0, 0 ]

    call MPI_Type_create_subarray( 2, sizes, subsizes, starts,     &
                                   MPI_ORDER_FORTRAN, MPI_INTEGER, &
                                   blocktype, ierr)

    start = 0
    call MPI_Type_size(MPI_INTEGER, intsize, ierr)
    extent = intsize * nsendrows

    call MPI_Type_create_resized(blocktype, start, extent, resizedtype, ierr)
    call MPI_Type_commit(resizedtype, ierr)

    call MPI_Gather( send, nsendrows*nsendcols, MPI_INTEGER, &  ! everyone send 3*2 ints
                     recv, 1, resizedtype,                   &  ! root gets 1 resized type from everyone
                     root, MPI_COMM_WORLD, ierr)

    if (my_rank==0) then
    print*,'<><><><><>recv'
    do i=1,nsendrows*comsize
        print*,(recv(i,j),j=1,nsendcols)
    enddo
    endif
    call MPI_Finalize(ierror)

end program testmpi

使用3个进程运行此命令：

$ mpirun -np 3 ./testmpi
 <><><><><>recv
           0           0           0
           0           0           0
           1           1           1
           1           1           1
           2           2           2
           2           2           2

使用MPI_Gather在Fortran中发送2D数组

1 个答案: