解决密集点稀疏

时间:2018-07-04 13:45:26

标签: python numpy scipy sparse-matrix

我有稀疏矩阵S

<14940x14940 sparse matrix of type '<class 'numpy.float64'>'
    with 39840 stored elements in COOrdinate format>

和密集向量d

Out[7]: (1, 14940)

,并希望使用预期的尺寸d dot S计算1 x 14940。由于尺寸,这应该可以解决。但是,np.dot无法理解其参数的类型,因此

np.dot(d, S)

导致崩溃。接下来,

d.dot(S)

奇怪地导致

Out[4]: 
array([[<14940x14940 sparse matrix of type '<class 'numpy.float64'>'
    with 39840 stored elements in COOrdinate format>,
        <14940x14940 sparse matrix of type '<class 'numpy.float64'>'
    with 39840 stored elements in COOrdinate format>,
        <14940x14940 sparse matrix of type '<class 'numpy.float64'>'
    with 39840 stored elements in COOrdinate format>,
        ...,
        <14940x14940 sparse matrix of type '<class 'numpy.float64'>'
    with 39840 stored elements in COOrdinate format>,
        <14940x14940 sparse matrix of type '<class 'numpy.float64'>'
    with 39840 stored elements in COOrdinate format>,
        <14940x14940 sparse matrix of type '<class 'numpy.float64'>'
    with 39840 stored elements in COOrdinate format>]], dtype=object)

我最后的尝试是使用scipy.sparse.linalg.LinearOperator.dot,但显然希望这两个参数都是稀疏的:

scipy.sparse.linalg.LinearOperator.dot(d, S)
Traceback (most recent call last):
  File "/anaconda3/envs/myenv3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-3363ca62dfea>", line 1, in <module>
    scipy.sparse.linalg.LinearOperator.dot(d, S)
  File "/anaconda3/envs/myenv3/lib/python3.6/site-packages/scipy/sparse/linalg/interface.py", line 362, in dot
    return self.matvec(x)
AttributeError: 'numpy.ndarray' object has no attribute 'matvec'

如何计算点积?

1 个答案:

答案 0 :(得分:2)

使用d*S,适用于*的稀疏解释。

In [198]: S = sparse.random(10,10,.2)
In [199]: d = np.arange(10)[None,:]
In [200]: np.dot(d,S)
Out[200]: 
array([[<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 20 stored elements in COOrdinate format>,
    ...
    with 20 stored elements in COOrdinate format>]], dtype=object)

之所以产生

是因为np.dot天真地试图使S密集:

In [201]: np.array(S)
Out[201]: 
array(<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 20 stored elements in COOrdinate format>, dtype=object)

使S致密的正确方法是使用自己的.toarray方法或简短的.A属性:

In [202]: np.dot(d, S.A)
Out[202]: 
array([[ 0.14692294,  0.        ,  6.11562384, 10.33950994,  4.96106786,
         3.45833981, 10.40602568,  7.14361287,  9.92141019,  0.        ]])

使用*运算符执行相同的操作。对于稀疏矩阵,*是矩阵乘法。

In [203]: d*S
Out[203]: 
array([[ 0.14692294,  0.        ,  6.11562384, 10.33950994,  4.96106786,
         3.45833981, 10.40602568,  7.14361287,  9.92141019,  0.        ]])

我们还可以使d稀疏

In [204]: D = sparse.csr_matrix(d)
In [205]: D*S
Out[205]: 
<1x10 sparse matrix of type '<class 'numpy.float64'>'
    with 8 stored elements in Compressed Sparse Row format>
In [206]: _.A
Out[206]: 
array([[ 0.14692294,  0.        ,  6.11562384, 10.33950994,  4.96106786,
         3.45833981, 10.40602568,  7.14361287,  9.92141019,  0.        ]])

(这种稀疏稀疏矩阵乘积实际上要慢一些。)

稀疏np.dot的行为如下:

In [208]: np.dot(D,S).A
Out[208]: 
array([[ 0.14692294,  0.        ,  6.11562384, 10.33950994,  4.96106786,
         3.45833981, 10.40602568,  7.14361287,  9.92141019,  0.        ]])

较新的matmul运算符也可以使用(但请注意尺寸(1,n)):

In [209]: d@S
Out[209]: 
array([[ 0.14692294,  0.        ,  6.11562384, 10.33950994,  4.96106786,
         3.45833981, 10.40602568,  7.14361287,  9.92141019,  0.        ]])