我正在尝试将以下MATLAB代码转换为Python,并且无法找到在任何合理时间内都能正常工作的解决方案。
M = diag(sum(a)) - a;
where = vertcat(in, out);
M(where,:) = 0;
M(where,where) = 1;
这里,a是稀疏矩阵,其中是向量(如进/出)。我使用Python的解决方案是:
M = scipy.sparse.diags([degs], [0]) - A
where = numpy.hstack((inVs, outVs)).astype(int)
M = scipy.sparse.lil_matrix(M)
M[where, :] = 0 # This is the slowest line
M[where, where] = 1
M = scipy.sparse.csc_matrix(M)
但由于A是334863x334863,这需要三分钟。如果有人对如何加快速度有任何建议,请提供帮助!为了进行比较,MATLAB在不知不觉中快速完成了同样的步骤。
谢谢!
答案 0 :(得分:8)
对alko / seberg的方法略有不同。我发现循环令人不安,所以我花了今天早上的大部分时间想办法摆脱它。以下并不总是比其他方法更快。它表现得越好,越多的行被归零,矩阵越稀疏:
def csr_zero_rows(csr, rows_to_zero):
rows, cols = csr.shape
mask = np.ones((rows,), dtype=np.bool)
mask[rows_to_zero] = False
nnz_per_row = np.diff(csr.indptr)
mask = np.repeat(mask, nnz_per_row)
nnz_per_row[rows_to_zero] = 0
csr.data = csr.data[mask]
csr.indices = csr.indices[mask]
csr.indptr[1:] = np.cumsum(nnz_per_row)
试驾两种方法:
rows, cols = 334863, 334863
a = sps.rand(rows, cols, density=0.00001, format='csr')
b = a.copy()
rows_to_zero = np.random.choice(np.arange(rows), size=10000, replace=False)
In [117]: a
Out[117]:
<334863x334863 sparse matrix of type '<type 'numpy.float64'>'
with 1121332 stored elements in Compressed Sparse Row format>
In [118]: %timeit -n1 -r1 csr_rows_set_nz_to_val(a, rows_to_zero)
1 loops, best of 1: 75.8 ms per loop
In [119]: %timeit -n1 -r1 csr_zero_rows(b, rows_to_zero)
1 loops, best of 1: 32.5 ms per loop
当然:
np.allclose(a.data, b.data)
Out[122]: True
np.allclose(a.indices, b.indices)
Out[123]: True
np.allclose(a.indptr, b.indptr)
Out[124]: True
答案 1 :(得分:7)
我用于类似任务的解决方案attributes to @seberg并且不会转换为lil
格式:
import scipy.sparse
import numpy
import time
def csr_row_set_nz_to_val(csr, row, value=0):
"""Set all nonzero elements (elements currently in the sparsity pattern)
to the given value. Useful to set to 0 mostly.
"""
if not isinstance(csr, scipy.sparse.csr_matrix):
raise ValueError('Matrix given must be of CSR format.')
csr.data[csr.indptr[row]:csr.indptr[row+1]] = value
def csr_rows_set_nz_to_val(csr, rows, value=0):
for row in rows:
csr_row_set_nz_to_val(csr, row)
if value == 0:
csr.eliminate_zeros()
用时间包装您的评估
def evaluate(size):
degs = [1]*size
inVs = list(xrange(1, size, size/25))
outVs = list(xrange(5, size, size/25))
where = numpy.hstack((inVs, outVs)).astype(int)
start_time = time.time()
A = scipy.sparse.csc_matrix((size, size))
M = scipy.sparse.diags([degs], [0]) - A
csr_rows_set_nz_to_val(M, where)
return time.time()-start_time
并测试其表现:
>>> print 'elapsed %.5f seconds' % evaluate(334863)
elapsed 0.53054 seconds
答案 2 :(得分:0)
如果您不喜欢在稀疏矩阵的内部进行挖掘,您也可以使用对角矩阵的稀疏矩阵乘法:
def zero_rows(M, rows_to_zero):
ixs = numpy.ones(M.shape[0], int)
ixs[rows_to_zero] = 0
D = sparse.diags(ixs)
res = D * M
return res