MPI_WIN_ALLOCATE_SHARED:内存有限吗?

时间:2019-04-04 11:20:27

标签: fortran mpi shared-memory

似乎每当我尝试分配30-32 Mb的窗口时,我都会遇到分段错误?

我正在使用以下例程MPI_WIN_ALLOCATE_SHARED

有人知道我的窗户有多大吗?如果是这样,是否有一种方法可以编译我的代码以放松该限制?

我正在使用INTEL MPI 19.0.3和ifort 19.0.3-

用Fortran编写的示例。通过改变整数size_,您可以查看何时发生分段错误。我用size_=10e3size_=10e4进行了测试,后者引起了分段错误

C------
      program TEST_STACK
      use, INTRINSIC ::ISO_C_BINDING

      implicit none
      include 'mpif.h'

      !---  Parameters (They should not be changed ! )
      integer, parameter   :: whoisroot   = 0  ! - Root always 0 here
      !---  General parallel
      integer              :: whoami                 ! - My rank
      integer              :: mpi_nproc              ! - no. of procs
      integer              :: mpierr                 ! - Error status
      integer              :: status(MPI_STATUS_SIZE)! - For MPI_RECV
      !---  Shared memory stuff
      integer              :: whoami_shm             ! - Local rank in shared memory group
      integer              :: mpi_shm_nproc          ! - No. of procs in Shared memory group
      integer              :: no_partners            ! - No. of partners for share memory
      integer              :: info_alloc
      !---  MPI groups
      integer              :: world_group            ! - All procs across all nodes
      integer              :: shared_group           ! - Only procs that share memory
      integer              :: MPI_COMM_SHM           ! - Shared memory communicators (for those in shared_group)

      type(C_PTR)                         :: ptr_buf
      integer(kind = MPI_ADDRESS_KIND)    :: size_bytes, lb
      integer                             :: win, size_, disp_unit

      call MPI_INIT        ( mpierr )
      call MPI_COMM_RANK   ( MPI_COMM_WORLD, whoami, mpierr )

      call MPI_COMM_RANK   ( MPI_COMM_WORLD, whoami, mpierr )
      call MPI_COMM_SIZE   ( MPI_COMM_WORLD, mpi_nproc, mpierr)
      call MPI_COMM_SPLIT_TYPE( MPI_COMM_WORLD
     &                        , MPI_COMM_TYPE_SHARED
     &                        , 0
     &                        , MPI_INFO_NULL
     &                        , MPI_COMM_SHM
     &                        , mpierr )

      call MPI_COMM_RANK( MPI_COMM_SHM, whoami_shm, mpierr )
      call MPI_COMM_SIZE( MPI_COMM_SHM, mpi_shm_nproc, mpierr )
      size_ = 10e4! - seg fault 
      size_bytes = size_ * MPI_REAL
      disp_unit  = MPI_REAL
      size_bytes = size_*disp_unit
      call MPI_INFO_CREATE( info_alloc, mpierr )
      call MPI_INFO_SET( info_alloc
     &                    , "alloc_shared_noncontig"
     &                    , "true"
     &                    , mpierr )
      !


      call MPI_WIN_ALLOCATE_SHARED( size_bytes
     &                            , disp_unit
     &                            , info_alloc
     &                            , MPI_COMM_SHM
     &                            , ptr_buf
     &                            , win
     &                            , mpierr )

      call MPI_WIN_FREE(win, mpierr)


      end program TEST_STACK

我使用以下命令运行代码

mpif90 test_stack.f90; mpirun -np 2 ./a.out

此包装器链接到我的ifort 19.0.3和Intel MPI库。已通过运行验证 mpif90 -v

并且要非常精确,我的mpif90是指向我的mpiifort包装器的符号链接。这样做是为了方便个人使用,但我猜应该不会引起问题吗?

2 个答案:

答案 0 :(得分:3)

手册说,对MPI_WIN_ALLOCATE_SHARED的调用看起来像这样

USE MPI

MPI_WIN_ALLOCATE_SHARED(SIZE, DISP_UNIT, INFO, COMM, BASEPTR, WIN, IERROR)
    INTEGER(KIND=MPI_ADDRESS_KIND) SIZE, BASEPTR
    INTEGER DISP_UNIT, INFO, COMM, WIN, IERROR

程序中至少disp_unitbaseptr的类型不匹配。

答案 1 :(得分:1)

我终于能够诊断出错误的根源。

在我拥有的代码中

  disp_unit  = MPI_REAL
  size_bytes = size_*disp_unit

MPI_REAL是由MPI定义的常数/参数,等于4,这是我非常错误地期望的(单精度为4字节,为4)。在我的版本中,它设置为1275069468,最有可能是指id,而不是任何有意义的数字。 因此,将此数字乘以数组的大小会很快超过可用内存,而且还会超出可以用字节整数表示的位数