Question

我正在尝试各种IPC方法来执行以下操作：

大师开始。
Master启动奴隶。
Master将数组传递给slave。
Slave处理数组。
Slave将阵列发送回master。

我尝试使用OpenMPI来解决这个问题，让父进程生成一个子进程，然后进行上述处理。但是，我也尝试过 - 我认为这是最糟糕的方法 - 让master将数据写入文件并让slave读写该文件。 结果令人惊叹。

以下是我实现这一目标的两种方式。第一种方式是“文件”方式，第二种方式是使用OpenMPI。

Master.f90

SecurityException

Slave.f90

program master
implicit none

integer*4, dimension (10000) :: matrix
integer :: length, i, exitstatus, cmdstatus
logical :: waistatus

! put integers in matrix and output data into a file 
open(1, file='matrixdata.dat', status='new')

length = 10000

do i=1,length
    matrix(i) = i
    write(1,*) matrix(i)
end do

close(1)

call execute_command_line("./slave.out", wait = .true., exitstat=exitstatus)

if(exitstatus .eq. 0) then
    ! open and read the file changed by subroutine slave
    open(1, file= 'matrixdata.dat', status='old')
    do i = 1, length
        read(1,*) matrix(i)
    end do
    close(1)
endif

end program master

* OpenMPI *

Master.f90

program slave
implicit none

    integer*4, dimension (10000) :: matrix
    integer :: length, i

    ! Open and read the file made by master into a matrix
    open (1, file= 'matrixdata.dat', status = 'old')
    length = 10000

    do i = 1, length
        read(1,*) matrix(i)
    end do
    close(1)

    ! Square all numbers and write over the file with new data
    open(1, file= 'matrixdata.dat', status = 'old')
    do i=1,length
        matrix(i) = matrix(i)**2
        write(1,*) matrix(i)
    end do
    close(1)

end program slave

Slave.f90

program master
use mpi
implicit none

    integer :: ierr, num_procs, my_id, intercomm, i, siz, array(10000000), s_tag, s_dest, siffra

    CALL MPI_INIT(ierr)

    CALL MPI_COMM_RANK(MPI_COMM_WORLD, my_id, ierr)
    CALL MPI_COMM_SIZE(MPI_COMM_WORLD, num_procs, ierr)

    siz = 10000

    !print *, "S.Rank =", my_id
    !print *, "S.Size =", num_procs

    if (.not. (ierr .eq. 0)) then
        print*, "S.Unable to initilaize bös!"
        stop
    endif

    do i=1,size(array)
        array(i) = 2
    enddo

    if (my_id .eq. 0) then
        call MPI_Comm_spawn("./slave.out", MPI_ARGV_NULL, 1, MPI_INFO_NULL, my_id, &
        & MPI_COMM_WORLD, intercomm, MPI_ERRCODES_IGNORE, ierr)


        s_dest = 0 !rank of destination (integer)
        s_tag =  1 !message tag (integer)
        call MPI_Send(array(1), siz, MPI_INTEGER, s_dest, s_tag, intercomm, ierr)

        call MPI_Recv(array(1), siz, MPI_INTEGER, s_dest, s_tag, intercomm, MPI_STATUS_IGNORE, ierr)

        !do i=1,10
        !   print *, "S.Array(",i,"): ", array(i)
        !enddo

    endif

    call MPI_Finalize(ierr)

end program master

现在，有趣的是，通过使用program name use mpi implicit none ! type declaration statements integer :: ierr, parent, my_id, n_procs, i, siz, array(10000000), ctag, csource, intercomm, siffra logical :: flag siz = 10000 ! executable statements call MPI_Init(ierr) call MPI_Initialized(flag, ierr) call MPI_Comm_get_parent(parent, ierr) call MPI_Comm_rank(MPI_COMM_WORLD, my_id, ierr) call MPI_Comm_size(MPI_COMM_WORLD, n_procs, ierr) csource = 0 !rank of source ctag = 1 !message tag call MPI_Recv(array(1), siz, MPI_INTEGER, csource, ctag, parent, MPI_STATUS_IGNORE, ierr) !do i=1,10 ! print *, "C.Array(",i,"): ", array(i) !enddo do i=1,size(array) array(i) = array(i)**2 enddo !do i=1,10 ! print *, "C.Array(",i,"): ", array(i) !enddo call MPI_Send(array(1), siz, MPI_INTEGER, csource, ctag, parent, ierr) call MPI_Finalize(ierr) end program name程序，我测得执行“程序的文件版本”需要19.8 ms。 OpenMPI版本需要60毫秒。为什么？ OpenMPI中是否有如此多的开销，如果你使用＆lt; 400 KiB，读取/写入文件会更快？

我尝试将数组增加到10 ^ 5整数。文件版本在114ms内执行，OpenMPI在53ms内执行。当增加到10 ^ 6个整数文件时：1103 ms，OpenMPI：77ms。

开销真的那么多吗？

Answer 1

从根本上说，对于适合缓存的问题大小使用分布式处理是没有意义的（除了在一些平凡的并行情况下）。典型的使用场景是数据传输比LLC大得多。即使你最大的情况（10 ^ 6）适合现代缓存。

首先，对于写入磁盘的方法，您必须了解页面缓存对操作系统的影响。如果你的MPI进程在同一芯片上，操作系统只会听到“写一个”然后“做一个读”。如果，在此期间，没有任何东西污染页面缓存，那么它将只从RAM中获取数据，而不是磁盘。一个更好的实验是在写和读之间刷新页面缓存（这是可能的，至少在linux上，通过shell命令）。实际上，如果您从页面缓存中获取数据，则表示正在执行共享内存处理。

此外，您在命令行上使用time，这样您就可以将MPI初始化和建立通信接口所花费的时间与几个函数调用结合起来。这不是一个好的基准，因为为磁盘IO方法提供的接口已经由操作系统初始化。同样对于这样小的问题大小，与程序主体的运行时相比，MPI的初始化是非常重要的。正确的方法是在代码中进行计时。

对于这两种方法，您应该期望由方法的开销产生偏差的线性缩放。事实上，您应该看到一些制度，因为数据大小超过了LLC和页面缓存。最好的方法是使用ARRAY_SIZE = 2 ^ n重复运行n = 12,13，... 24并检查曲线。

OpenMPI IPC的性能比读/写文件差

1 个答案: