我有一个很大的csr_matrix(1M * 1K),我想添加行并获得一个新的csr_matrix,其列数相同但行数减少。实际上我的问题与此Sum over rows in scipy.sparse.csr_matrix完全相同。唯一的问题是我发现接受的解决方案对我来说是缓慢的。让我说明我拥有的东西
map_fn = np.random.randint(0, 10000, 1000000)
map_fn
这里告诉我输入行(1M)如何映射到我的输出行(10K)。例如,第i个输入行被添加到map_fn[i]
输出行。我尝试了上述问题中提到的两种方法,
即形成稀疏矩阵并使用稀疏和。虽然稀疏矩阵方法看起来比稀疏和方法更好,但我觉得它的目的很慢。以下是比较两种方法的代码:
import scipy.sparse
import numpy as np
import time
print "Setting up input"
s=10000
n=1000000
d=1000
density=1.0/500
X=scipy.sparse.rand(n,d,density=density,format="csr")
map_fn=np.random.randint(0, s, n)
# Approach 1
start_time=time.time()
col = scipy.arange(n)
val = np.ones(n)
S = scipy.sparse.csr_matrix( (val, (map_fn, col)), shape = (s,n))
print "Approach 1 Creation time : ",time.time()-start_time
SX = S.dot(X)
print "Approach 1 Total time : ",time.time()-start_time
#Approach 2
start_time=time.time()
SX = np.zeros((s,X.shape[1]))
for i in range(SX.shape[0]):
SX[i,:] = X[np.where(map_fn==i)[0],:].sum(axis=0)
print "Approach 2 Total time : ",time.time()-start_time
给出以下数字:
Approach 1 Creation time : 0.187678098679
Approach 1 Total time : 0.286989927292
Approach 2 Total time : 10.208632946
所以我的问题是这有更好的方法吗?我发现形成稀疏矩阵是一种过度杀伤,因为它需要超过一半的时间。还有更好的选择吗?任何建议都非常感谢。谢谢
答案 0 :(得分:4)
启动方法
改编sparse solution from this post
-
def sparse_matrix_mult_sparseX_mod1(X, rows):
nrows = rows.max()+1
ncols = X.shape[1]
nelem = nrows * ncols
a,b = X.nonzero()
ids = rows[a] + b*nrows
sums = np.bincount(ids, X[a,b].A1, minlength=nelem)
out = sums.reshape(ncols,-1).T
return out
<强>基准强>
原创方法#1 -
def app1(X, map_fn):
col = scipy.arange(n)
val = np.ones(n)
S = scipy.sparse.csr_matrix( (val, (map_fn, col)), shape = (s,n))
SX = S.dot(X)
return SX
计时和验证 -
In [209]: # Inputs setup
...: s=10000
...: n=1000000
...: d=1000
...: density=1.0/500
...:
...: X=scipy.sparse.rand(n,d,density=density,format="csr")
...: map_fn=np.random.randint(0, s, n)
...:
In [210]: out1 = app1(X, map_fn)
...: out2 = sparse_matrix_mult_sparseX_mod1(X, map_fn)
...: print np.allclose(out1.toarray(), out2)
...:
True
In [211]: %timeit app1(X, map_fn)
1 loop, best of 3: 517 ms per loop
In [212]: %timeit sparse_matrix_mult_sparseX_mod1(X, map_fn)
10 loops, best of 3: 147 ms per loop
公平地说,我们应该从app1
-
In [214]: %timeit app1(X, map_fn).toarray()
1 loop, best of 3: 584 ms per loop
移植到Numba
我们可以将分箱计数步骤转换为numba,这可能对更密集的输入矩阵有益。其中一种方法是 -
from numba import njit
@njit
def bincount_mod2(out, rows, r, C, V):
N = len(V)
for i in range(N):
out[rows[r[i]], C[i]] += V[i]
return out
def sparse_matrix_mult_sparseX_mod2(X, rows):
nrows = rows.max()+1
ncols = X.shape[1]
r,C = X.nonzero()
V = X[r,C].A1
out = np.zeros((nrows, ncols))
return bincount_mod2(out, rows, r, C, V)
计时 -
In [373]: # Inputs setup
...: s=10000
...: n=1000000
...: d=1000
...: density=1.0/100 # Denser now!
...:
...: X=scipy.sparse.rand(n,d,density=density,format="csr")
...: map_fn=np.random.randint(0, s, n)
...:
In [374]: %timeit app1(X, map_fn)
1 loop, best of 3: 787 ms per loop
In [375]: %timeit sparse_matrix_mult_sparseX_mod1(X, map_fn)
1 loop, best of 3: 906 ms per loop
In [376]: %timeit sparse_matrix_mult_sparseX_mod2(X, map_fn)
1 loop, best of 3: 705 ms per loop
来自app1
-
In [379]: %timeit app1(X, map_fn).toarray()
1 loop, best of 3: 910 ms per loop