需要有关此Fortran MPI问题的帮助。尝试从2D阵列的不同列收集数据。问题是不使用每行的所有数据,并且每个进程分配的列不相等。所有进程都以等效的全局数据视图开始,每个进程应对特定列执行工作,最后交换信息,以便所有进程再次共享公共视图。问题类似于MPI partition and gather 2D array in Fortran和Sending 2D arrays in Fortran with MPI_Gather
绘制示例:使用3 MPI过程的数据(8,4)
---------------------
| a1 | b1 | c1 | d1 |
| a2 | b2 | c2 | d2 |
| a3 | b3 | c3 | d3 |
| a4 | b4 | c4 | d4 |
| a5 | b5 | c5 | d5 |
| a6 | b6 | c6 | d6 |
| a7 | b7 | c7 | d7 |
| a8 | b8 | c8 | d8 |
---------------------
进程1将获得2列工作,进程2获得1列,进程3获得1列。
----------- ------ ------
| a1 | b1 | | c1 | | d1 |
| a2 | b2 | | c2 | | d2 |
| a3 | b3 | | c3 | | d3 |
| a4 | b4 | | c4 | | d4 |
| a5 | b5 | | c5 | | d5 |
| a6 | b6 | | c6 | | d6 |
| a7 | b7 | | c7 | | d7 |
| a8 | b8 | | c8 | | d8 |
----------- ------ ------
在实际问题中,实际大小是数据(200000,59)。这是一个预分配的内存块,我只使用每列的一部分(始终从索引1开始)。例如,我只需要每列中的前3个值。
----------- ------ ------
| a1 | b1 | | c1 | | d1 |
| a2 | b2 | | c2 | | d2 |
| a3 | b3 | | c3 | | d3 |
| == | == | | == | | == |
| a4 | b4 | | c4 | | d4 |
| a5 | b5 | | c5 | | d5 |
| a6 | b6 | | c6 | | d6 |
| a7 | b7 | | c7 | | d7 |
| a8 | b8 | | c8 | | d8 |
----------- ------ ------
我正在尝试创建可用于完成此操作的发送和接收数据类型。到目前为止,我最好的猜测是使用MPI_TYPE_VECTOR。 MPI_TYPE_VECTOR(COUNT,BLOCKLENGTH,STRIDE,OLDTYPE,NEWTYPE,IERROR)
为此,将使用MPI_TYPE_VECTOR(1,3,8,MPI_DOUBLE,newtype,ierr)。这应该允许每个进程发送最少量的信息。有了这个,我想我应该能够用ALLGATHERV发送信息。
MPI_ALLGATHERV(SENDBUF,SENDCOUNT,SENDTYPE,RECVBUF,RECVCOUNT,DISPLS,RECVTYPE,COMM,IERROR) 我在哪里使用MPI_ALLGATHERV(data(1,my_first_col),num_cols_to_be_sent,newtype,data,RECVCOUNT [],DISPLS [],newtype,COMM,IERROR)
据我所知,这是应该为每个进程发送的信息。
Process 1: [a1,a2,a3,b1,b2,b3]
Process 2: [c1,c2,c3]
Process 3: [d1,d2,d3]
我看到的例子都使用整列数据或者位移自然是所需子阵列的倍数。我无法将其解压缩到正确的列中。由于接收端对类型的大小/范围有所了解,因此无法做到这一点。当然,我在整个程度上都很困惑。任何帮助,将不胜感激。真正的代码正在运行,但这里是一个快速重新查看和评论(可能无法编译,只是快速制作)。
MODULE PARALLEL
INTEGER iproc, nproc, rank, ierr
INTEGER mylow, myhigh, mysize, ichunk, irem
INTEGER, ALLOCATABLE :: isize(:), idisp(:), ilow(:), ihigh(:)
DOUBLE PRECISION, ALLOCATABLE :: glob_val(:,:)
INTEGER newtype
END MODULE
PROGRAM MAIN
USE PARALLEL
IMPLICIT NONE
INCLUDE 'mpif.f'
c **temp variables
integer i, j
integer num_rows,num_cols
integer used_rows
c ----setup MPI----
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nproc,ierr)
iproc = rank+1 !rank is base 0, rest of fortran base 1
c ----setup initial data
integer num_rows=20 !contiguous in memory over rows (ie single column)
integer num_cols=11 !noncontiguous memory between different columns
integer
ALLOCATE (isize(nproc))
ALLOCATE (idisp(nproc))
ALLOCATE (ilow(nproc))
ALLOCATE (ishigh(nproc))
ALLOCATE (glob_val(num_rows,num_cols))
glob_val = 1.0*iproc !sent all glob values to process id
do i=1,num_cols
do j=1,used_rows
glob_val(j,i) = iproc+.01*j !add refernce index to used data
end do
end do
c ---setup exchange information
ichunk = num_cols/nproc
irem = num_cols -(ichunk*nproc)
mysize=ichunk
if(iproc.le.irem) mysize=mysize+1
mylow=0
myhigh=0
do i=1,nproc !establish global understanding of processes
mylow=myhigh+1
myhigh=mylow+ichunk
if(i.le.irem) myhigh=myhigh+1
isize(i)=myhigh-mylow+1
idisp(i)=(mylow-1) !based on receiving type size/extent
ilow(i)=mylow
ihigh(i)=myhigh
end do
mylow=ilow(iproc)
myhigh=ihigh(iproc)
call MPI_TYPE_VECTOR(1,used_rows,num_rows,MPI_DOUBLE,
& newtype,ierr)
call MPI_TYPE_COMMIT(newtype,ierr)
c --- perform exchange based on 'newtype'
!MPI_ALLGATHERV(SENDBUF, SENDCOUNT, SENDTYPE,
! RECVBUF, RECVCOUNT, DISPLS, RECVTYPE,
! COMM, IERROR)
call MPI_ALLGATHERV(glob_val(1,mylow),mysize,newtype
& glob_val,isize,iproc,newtype,
& MPI_COMM_WORLD,ierr)
c ---print out global results of process 2
if(iproc.eq.2) then
do i=1,num_rows
write(*,*) (glob_val(i,j),j=1,num_cols)
end do
end if
END program
答案 0 :(得分:0)
好的,我按照以下方式工作:
1)myhigh=mylow + ichunk - 1
不是myhigh = mylow + ichunk
2)used_rows必须在赋值循环之前设置
3)更明确地定义实际缓冲区,尝试
call MPI_ALLGATHERV(glob_val(:,mylow:myhigh), mysize, newtype, &
glob_val(1:used_rows,:), isize, idisp, newtype, &
MPI_COMM_WORLD, ierr)
使用gfortran和openmpi的完整代码:
MODULE PARALLEL
INTEGER iproc, nproc, rank, ierr
INTEGER mylow, myhigh, mysize, ichunk, irem
INTEGER, ALLOCATABLE :: isize(:), idisp(:), ilow(:), ihigh(:)
DOUBLE PRECISION, ALLOCATABLE :: glob_val(:,:)
INTEGER newtype
END MODULE
PROGRAM MAIN
USE PARALLEL
use mpi
IMPLICIT NONE
! INCLUDE 'mpif.f'
! **temp variables
integer i, j
integer num_rows,num_cols
integer used_rows
! ----setup MPI----
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nproc,ierr)
iproc = rank+1 !rank is base 0, rest of fortran base 1
! ----setup initial data
num_rows=8 !contiguous in memory over rows (ie single column)
num_cols=4 !noncontiguous memory between different columns
used_rows = 3
ALLOCATE (isize(nproc))
ALLOCATE (idisp(nproc))
ALLOCATE (ilow(nproc))
ALLOCATE (ihigh(nproc))
ALLOCATE (glob_val(num_rows,num_cols))
! glob_val = 1.0*iproc !sent all glob values to process id
glob_val = -1.0 * iproc
do i=1,num_cols
do j=1,used_rows
glob_val(j,i) = (1.0*iproc)+(.01*j) !add refernce index to used data
end do
end do
! ---setup exchange information
ichunk = num_cols/nproc
irem = num_cols -(ichunk*nproc)
mysize=ichunk
if(iproc.le.irem) mysize=mysize+1
mylow=0
myhigh=0
do i=1,nproc !establish global understanding of processes
mylow=myhigh+1
myhigh=mylow+ichunk-1
if(i.le.irem) myhigh=myhigh+1
isize(i)=myhigh-mylow+1
idisp(i)=(mylow-1) !based on receiving type size/extent
ilow(i)=mylow
ihigh(i)=myhigh
end do
mylow=ilow(iproc)
myhigh=ihigh(iproc)
call MPI_TYPE_VECTOR(1,used_rows,num_rows,MPI_DOUBLE, &
newtype,ierr)
call MPI_TYPE_COMMIT(newtype,ierr)
write(*,*) rank, idisp
write(*,*) rank, isize
! --- perform exchange based on 'newtype'
!MPI_ALLGATHERV(SENDBUF, SENDCOUNT, SENDTYPE,
! RECVBUF, RECVCOUNT, DISPLS, RECVTYPE,
! COMM, IERROR)
call MPI_ALLGATHERV(glob_val(:,mylow:myhigh),mysize,newtype, &
glob_val(1:used_rows,:),isize,idisp,newtype, &
MPI_COMM_WORLD,ierr)
! ---print out global results of process 2
if(iproc.eq.2) then
do i=1,num_rows
write(*,*) (glob_val(i,j),j=1,num_cols)
end do
end if
call MPI_Finalize(ierr)
END program