似乎每当我尝试分配30-32 Mb的窗口时,我都会遇到分段错误?
我正在使用以下例程MPI_WIN_ALLOCATE_SHARED
有人知道我的窗户有多大吗?如果是这样,是否有一种方法可以编译我的代码以放松该限制?
我正在使用INTEL MPI 19.0.3和ifort 19.0.3-
用Fortran编写的示例。通过改变整数size_
,您可以查看何时发生分段错误。我用size_=10e3
和size_=10e4
进行了测试,后者引起了分段错误
C------
program TEST_STACK
use, INTRINSIC ::ISO_C_BINDING
implicit none
include 'mpif.h'
!--- Parameters (They should not be changed ! )
integer, parameter :: whoisroot = 0 ! - Root always 0 here
!--- General parallel
integer :: whoami ! - My rank
integer :: mpi_nproc ! - no. of procs
integer :: mpierr ! - Error status
integer :: status(MPI_STATUS_SIZE)! - For MPI_RECV
!--- Shared memory stuff
integer :: whoami_shm ! - Local rank in shared memory group
integer :: mpi_shm_nproc ! - No. of procs in Shared memory group
integer :: no_partners ! - No. of partners for share memory
integer :: info_alloc
!--- MPI groups
integer :: world_group ! - All procs across all nodes
integer :: shared_group ! - Only procs that share memory
integer :: MPI_COMM_SHM ! - Shared memory communicators (for those in shared_group)
type(C_PTR) :: ptr_buf
integer(kind = MPI_ADDRESS_KIND) :: size_bytes, lb
integer :: win, size_, disp_unit
call MPI_INIT ( mpierr )
call MPI_COMM_RANK ( MPI_COMM_WORLD, whoami, mpierr )
call MPI_COMM_RANK ( MPI_COMM_WORLD, whoami, mpierr )
call MPI_COMM_SIZE ( MPI_COMM_WORLD, mpi_nproc, mpierr)
call MPI_COMM_SPLIT_TYPE( MPI_COMM_WORLD
& , MPI_COMM_TYPE_SHARED
& , 0
& , MPI_INFO_NULL
& , MPI_COMM_SHM
& , mpierr )
call MPI_COMM_RANK( MPI_COMM_SHM, whoami_shm, mpierr )
call MPI_COMM_SIZE( MPI_COMM_SHM, mpi_shm_nproc, mpierr )
size_ = 10e4! - seg fault
size_bytes = size_ * MPI_REAL
disp_unit = MPI_REAL
size_bytes = size_*disp_unit
call MPI_INFO_CREATE( info_alloc, mpierr )
call MPI_INFO_SET( info_alloc
& , "alloc_shared_noncontig"
& , "true"
& , mpierr )
!
call MPI_WIN_ALLOCATE_SHARED( size_bytes
& , disp_unit
& , info_alloc
& , MPI_COMM_SHM
& , ptr_buf
& , win
& , mpierr )
call MPI_WIN_FREE(win, mpierr)
end program TEST_STACK
我使用以下命令运行代码
mpif90 test_stack.f90; mpirun -np 2 ./a.out
此包装器链接到我的ifort 19.0.3和Intel MPI库。已通过运行验证
mpif90 -v
并且要非常精确,我的mpif90
是指向我的mpiifort
包装器的符号链接。这样做是为了方便个人使用,但我猜应该不会引起问题吗?
答案 0 :(得分:3)
手册说,对MPI_WIN_ALLOCATE_SHARED
的调用看起来像这样
USE MPI
MPI_WIN_ALLOCATE_SHARED(SIZE, DISP_UNIT, INFO, COMM, BASEPTR, WIN, IERROR)
INTEGER(KIND=MPI_ADDRESS_KIND) SIZE, BASEPTR
INTEGER DISP_UNIT, INFO, COMM, WIN, IERROR
程序中至少disp_unit
和baseptr
的类型不匹配。
答案 1 :(得分:1)
我终于能够诊断出错误的根源。
在我拥有的代码中
disp_unit = MPI_REAL
size_bytes = size_*disp_unit
MPI_REAL
是由MPI定义的常数/参数,不等于4,这是我非常错误地期望的(单精度为4字节,为4)。在我的版本中,它设置为1275069468
,最有可能是指id,而不是任何有意义的数字。
因此,将此数字乘以数组的大小会很快超过可用内存,而且还会超出可以用字节整数表示的位数