大阵列的异常mpi行为

时间:2015-05-31 05:01:37

标签: fortran mpi

在下面的数组求和代码中给出了正确的答案,当我使用max_rows = 10,100,1000,10000时,但是当我使用max_rows = 100000或更多时,我得到了异常的答案,甚至我得到了负的部分和其中一个过程。

program sum_vector
use mpi
implicit none 
integer,parameter::max_rows=100000
integer::myrank,master=0,ierr,status(mpi_status_size),num_procs
integer::i,rank,avg_rows_per_procs,sender
integer::num_rows_to_send,num_rows_to_receive,start_row,end_row,partial_sum,total_sum,st1,st2
integer,allocatable::vector(:),vector2(:)
allocate(vector(max_rows),stat=st1)
allocate(vector2(max_rows),stat=st2)
if(st1/=0 .or. st2/=0)then
    print*,'Cannot allocate'
    stop
end if 
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world,myrank,ierr)
call mpi_comm_size(mpi_comm_world,num_procs,ierr)
if (myrank==0)then 
    do i=1,max_rows
        vector(i)=i
    end do 
    avg_rows_per_procs=max_rows/num_procs
    do rank=1,num_procs-1
        start_row=rank*avg_rows_per_procs+1
        end_row=start_row+avg_rows_per_procs-1
        if (rank==num_procs-1)end_row=max_rows
        num_rows_to_send=end_row-start_row+1
        call mpi_send(num_rows_to_send,1,mpi_int,rank,101,mpi_comm_world,ierr)
        call mpi_send(vector(start_row),num_rows_to_send,mpi_int,rank,102,mpi_comm_world,ierr)
    end do
    total_sum=0
    do i=1,avg_rows_per_procs
        total_sum=total_sum+vector(i)
    end do
    print*,'Partial sum=',total_sum,'from root process'

    do rank=1,num_procs-1
        call mpi_recv(partial_sum,1,mpi_int,mpi_any_source,103,mpi_comm_world,status,ierr)
        sender=status(mpi_source)
        print*,'Partial sum=',partial_sum,'from rank',sender
        total_sum=total_sum+partial_sum
    end do

    print*,'Total sum=',total_sum
else
    call mpi_recv(num_rows_to_receive,1,mpi_int,master,mpi_any_tag,mpi_comm_world,status,ierr)
    call mpi_recv(vector2,num_rows_to_receive,mpi_int,master,mpi_any_tag,mpi_comm_world,status,ierr)
    partial_sum=0
    do i=1,num_rows_to_receive
        partial_sum=partial_sum+vector2(i)
    end do
    call mpi_send(partial_sum,1,mpi_int,master,103,mpi_comm_world,ierr)
end if 
call mpi_finalize(ierr)    
stop
end program sum_vector

1 个答案:

答案 0 :(得分:3)

对于大total_sumpartial_summax_rows似乎发生整数溢出,因为前者变得与〜max_rows**2一样大。将声明更改为

use iso_fortran_env, only: int64
integer(int64) :: total_sum, partial_sum

并且MPI要求发送/接收partial_sum

call mpi_recv(partial_sum,1,mpi_long_long_int,mpi_any_source,103,mpi_comm_world,status,ierr)

call mpi_send(partial_sum,1,mpi_long_long_int,master,103,mpi_comm_world,ierr)

可能会给出预期的结果。例如,使用max_rows = 100000和4个进程(使用gfortran 4.7和openmpi 1.6.5)获得的结果是

Partial sum=            312512500 from root process
Partial sum=            937512500 from rank           1
Partial sum=           1562512500 from rank           2
Partial sum=           2187512500 from rank           3
Total sum=           5000050000

max_rows = 100000000的结果是

Partial sum=      312500012500000 from root process
Partial sum=      937500012500000 from rank           1
Partial sum=     1562500012500000 from rank           2
Partial sum=     2187500012500000 from rank           3
Total sum=     5000000050000000

只要max_rows小于~2 * 10 ^ 9,此代码就可以正常工作。

附加说明:

  • 确切的答案是Total sum = max_rows * (max_rows + 1) / 2(只是从1到max_rows的总和)。

  • integer的最大数量通常约为2 * 10 ^ 9(请参阅integer),因此,如果max_rows大于10 ^ 5,(10 ^ 5)^ 2/2大于2 * 10 ^ 9,可能超过integer的限制。

修改:我已将integer(8)更改为integer(int64),以便它可以移植(请参阅@ casey'评论)。