MPI - 发送和接收矩阵列

时间:2017-11-20 20:01:38

标签: python numpy matrix mpi4py

我尝试使用Scatter将矩阵列发送到其他进程。下面的代码适用于行,所以为了使用最少的修改发送列,我使用Numpy转置函数。然而,除非我制作一个全新的矩阵副本(你可以想象,它会使目的失败),这似乎没有任何效果。

下面的3个最小例子来说明问题(必须运行3个进程!)。

  1. 分散行(按预期工作):

    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()
    
    A = np.zeros((3,3))
    if rank==0:
        A = np.matrix([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]])
    
    local_a = np.zeros(3)
    
    comm.Scatter(A, local_a, root=0)
    print "process", rank, "has", local_a
    

    给出输出:

    process 0 has [ 1.  2.  3.]
    process 1 has [ 4.  5.  6.]
    process 2 has [ 7.  8.  9.]
    
  2. 分散列(不起作用,仍散布行......):

    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()
    
    A = np.zeros((3,3))
    if rank==0:
        A = np.matrix([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]]).T
    
    local_a = np.zeros(3)
    
    comm.Scatter(A, local_a, root=0)
    print "process", rank, "has", local_a
    

    给出输出:

    process 0 has [ 1.  2.  3.]
    process 1 has [ 4.  5.  6.]
    process 2 has [ 7.  8.  9.]
    
  3. 分散列(有效,但似乎毫无意义):

    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()
    
    A = np.zeros((3,3))
    if rank==0:
        A = np.matrix([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]]).T.copy()
    
    local_a = np.zeros(3)
    
    comm.Scatter(A, local_a, root=0)
    print "process", rank, "has", local_a
    

    最后给出所需的输出:

    process 0 has [ 1.  4.  7.]
    process 2 has [ 3.  6.  9.]
    process 1 has [ 2.  5.  8.]
    
  4. 是否有一种简单的方法来发送列而无需复制整个矩阵?

    对于上下文,我正在mpi4py tutorial进行练习5。我的完整解决方案(如上面第3点所述浪费内存)就是这样,以防你想知道:

    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()
    
    A = np.zeros((3,3))
    v = np.zeros(3)
    result = np.zeros(3)
    if rank==0:
        A = np.array([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]]).T.copy()
        v = np.array([0.1,0.01,0.001])
    
    # Scatter the columns of the matrix
    local_a = np.zeros(3)
    comm.Scatter(A, local_a, root=0)
    
    # Scatter the elements of the vector
    local_v = np.array([0.])
    comm.Scatter(v, local_v, root=0)
    
    print "process", rank, "has A_ij =", local_a, "and v_i", local_v
    
    # Multiplication
    local_result = local_a * local_v
    
    # Add together
    comm.Reduce(local_result, result, op=MPI.SUM)
    print "process", rank, "finds", result, "(", local_result, ")"
    
    if (rank==0):
        print "The resulting vector is"
        print "   ", result, "computed in parallel"
        print "and", np.dot(A.T,v), "computed serially."
    

    以下是@Sajid要求的内存分析测试:

    我的解决方案3(给出正确答案): 0.027 MiB A = np.array([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]]).T.copy() 0.066 MiB comm.Scatter(A, local_a, root=0) 总计= 0.093 MiB

    另一个类似的解决方案(给出正确答案): 0.004 MiB A = np.array([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]]) 0.090 MiB comm.Scatter(A.T.copy(), local_a, root=0) 总计= 0.094 MiB

    @ Sajid的解决方案(给出正确答案): 0.039 MiB A[:,:] = np.transpose(np.array([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]])) 0.062 MiB comm.Scatter(A, local_a, root=0) 总计= 0.101 MiB

    我的解决方案2(给出了错误答案): 0.004 MiB A = np.array([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]]) 0.066 MiB comm.Scatter(A, local_a, root=0) 总计= 0.070 MiB

    (我只复制了行中的内存增量,其中代码版本之间的内存增量不同。显然,这都来自根节点。)

    似乎很清楚,所有正确的解决方案都必须将数组复制到内存中。这不是最理想的,因为我想要的只是分散列而不是行。

1 个答案:

答案 0 :(得分:1)

可能是数据未正确复制到A的问题,请尝试以下操作:

import numpy as np
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

A = np.zeros((3,3))
if rank==0:
    A[:,:] = np.transpose(np.matrix([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]]))

local_a = (np.zeros(3))

comm.Scatter(A, local_a, root=0)
print("process", rank, "has", local_a)

当然,如果您使用的是python2,请更改print语句。