Question

我正在尝试使用mpi4py并行组装一个非常大的稀疏矩阵。每个等级产生一个稀疏的子矩阵（采用scipy的dok格式），需要在非常大的矩阵中放置。到目前为止，如果每个等级产生一个包含索引和非零值的值的numpy数组（模仿coo格式），我已经成功了。收集过程后，我可以从numpy数组中组装大矩阵。最终矩阵将作为mtx格式文件写入磁盘。

收集稀疏子矩阵的最有效方法是什么？也许，直接将它们作为参数传递给gather（）？但是如何？

以下是我所做的简化示例：它从对角线子矩阵中组装出一个大的对角矩阵，在实际情况下，得到的大矩阵通常为500000x500000，而不是对角线。

from mpi4py import MPI
from numpy import *
import time
import scipy.sparse as ss
import scipy.io as sio

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

if rank == 0:
    tic = time.clock()      

# each rank generates a sparse matrix with N entries on the diagonal
N = 10000
tmp = ss.eye(N, format = 'dok') * rank

# extract indices and values
i,j = tmp.nonzero()
val = tmp.values()

# create the output array of each rank   
out = zeros((size(val),3))

# fill the output numpy array, shifting the indices according to the rank
out[:,0] = val
out[:,1] = i + rank * N
out[:,2] = j + rank * N

# gather all the arrays representing the submatrices
full_array = comm.gather(out,root=0)

if rank == 0:

    sp = shape(full_array)
    f = reshape(full_array, (sp[0]*sp[1],sp[2]))

    # this is the final result
    final_result = ss.csr_matrix( ( f[:,0], (f[:,1], f[:,2]) ) )
    sio.mmwrite('final.mtx', final_result)
    toc = time.clock()
    print 'Matrix assembled and written in', toc-tic, 'seconds'

Answer 1

根据hpaulj的建议，使用三个元素列表非常有效。这是一个有效的例子：

from mpi4py import MPI
from numpy import *
import scipy.sparse as ss
from timeit import default_timer as timer

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

if rank == 0:
    tic = timer()      

# each rank generates a sparse matrix with N entries on the diagonal
N = 100000
block = ss.eye(N, format = 'coo')

# extract indices and values
out = [ block.data, block.row , block.col]
out[1] = out[1] + rank * N
out[2] = out[2] + rank * N

# gather all the arrays representing the submatrices
full_list = comm.gather(out,root=0)

if rank == 0:
    dat = concatenate([x[0] for x in full_list])
    row = concatenate([x[1] for x in full_list])
    col = concatenate([x[2] for x in full_list])
    final_result = ss.csr_matrix( ( dat, (row, col) ) )
    toc = timer()
    print 'Matrix assembled in', toc-tic, 'seconds'

使用coo矩阵而不是dok，装配肯定要快得多。

在python中并行组装稀疏矩阵

1 个答案: