我正在尝试使用mpi4py并行组装一个非常大的稀疏矩阵。每个等级产生一个稀疏的子矩阵(采用scipy的dok格式),需要在非常大的矩阵中放置。到目前为止,如果每个等级产生一个包含索引和非零值的值的numpy数组(模仿coo格式),我已经成功了。收集过程后,我可以从numpy数组中组装大矩阵。最终矩阵将作为mtx格式文件写入磁盘。
收集稀疏子矩阵的最有效方法是什么?也许,直接将它们作为参数传递给gather()?但是如何?
以下是我所做的简化示例:它从对角线子矩阵中组装出一个大的对角矩阵,在实际情况下,得到的大矩阵通常为500000x500000,而不是对角线。
from mpi4py import MPI
from numpy import *
import time
import scipy.sparse as ss
import scipy.io as sio
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if rank == 0:
tic = time.clock()
# each rank generates a sparse matrix with N entries on the diagonal
N = 10000
tmp = ss.eye(N, format = 'dok') * rank
# extract indices and values
i,j = tmp.nonzero()
val = tmp.values()
# create the output array of each rank
out = zeros((size(val),3))
# fill the output numpy array, shifting the indices according to the rank
out[:,0] = val
out[:,1] = i + rank * N
out[:,2] = j + rank * N
# gather all the arrays representing the submatrices
full_array = comm.gather(out,root=0)
if rank == 0:
sp = shape(full_array)
f = reshape(full_array, (sp[0]*sp[1],sp[2]))
# this is the final result
final_result = ss.csr_matrix( ( f[:,0], (f[:,1], f[:,2]) ) )
sio.mmwrite('final.mtx', final_result)
toc = time.clock()
print 'Matrix assembled and written in', toc-tic, 'seconds'
答案 0 :(得分:0)
根据hpaulj的建议,使用三个元素列表非常有效。这是一个有效的例子:
from mpi4py import MPI
from numpy import *
import scipy.sparse as ss
from timeit import default_timer as timer
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if rank == 0:
tic = timer()
# each rank generates a sparse matrix with N entries on the diagonal
N = 100000
block = ss.eye(N, format = 'coo')
# extract indices and values
out = [ block.data, block.row , block.col]
out[1] = out[1] + rank * N
out[2] = out[2] + rank * N
# gather all the arrays representing the submatrices
full_list = comm.gather(out,root=0)
if rank == 0:
dat = concatenate([x[0] for x in full_list])
row = concatenate([x[1] for x in full_list])
col = concatenate([x[2] for x in full_list])
final_result = ss.csr_matrix( ( dat, (row, col) ) )
toc = timer()
print 'Matrix assembled in', toc-tic, 'seconds'
使用coo
矩阵而不是dok
,装配肯定要快得多。