Question

假设我有一个稀疏矩阵：

>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

我想将第0列和第2列归零。这是我想要得到的：

array([[0, 0, 0],
       [0, 0, 0],
       [0, 5, 0]])

以下是我尝试过的内容：

sp_mat = csr_matrix((data, indices, indptr), shape=(3, 3))
zero_cols = np.array([0, 2])
sp_mat[:, zero_cols] = 0

但是，我收到警告：

SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.

由于我拥有的sp_mat很大，因此转换为lil_matrix非常慢。

什么是有效的方法？

Answer 1

In [87]: >>> indptr = np.array([0, 2, 3, 6])
    ...: >>> indices = np.array([0, 2, 2, 0, 1, 2])
    ...: >>> data = np.array([1, 2, 3, 4, 5, 6])
    ...: M = sparse.csr_matrix((data, indices, indptr), shape=(3, 3))
In [88]: M
Out[88]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 6 stored elements in Compressed Sparse Row format>

看看csr分配会发生什么：

In [89]: M[:, [0, 2]] = 0
/usr/local/lib/python3.6/dist-packages/scipy/sparse/compressed.py:746: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  SparseEfficiencyWarning)
In [90]: M
Out[90]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 7 stored elements in Compressed Sparse Row format>
In [91]: M.data
Out[91]: array([0, 0, 0, 0, 0, 5, 0])
In [92]: M.indices
Out[92]: array([0, 2, 0, 2, 0, 1, 2], dtype=int32)

它不仅会发出警告，而且实际上会增加“稀疏”项的数量，尽管现在大多数值都为0。这些只有在我们清理时才会删除：

In [93]: M.eliminate_zeros()
In [94]: M
Out[94]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 1 stored elements in Compressed Sparse Row format>

在索引分配中，csr不能区分设置0和其他值。都是一样的。

我应该注意，效率警告主要是为了防止用户重复使用它（例如在迭代中）。对于一次性动作，这过于警报。

对于索引分配，lil格式更有效（或者至少不警告效率）。但是转换为该格式或从该格式转换都非常耗时。

另一种选择是直接查找并设置新的0，后跟eliminate_zeros）。

另一种方法是使用矩阵乘法。我认为右列中带有0的对角稀疏将解决问题。

In [103]: M
Out[103]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 6 stored elements in Compressed Sparse Row format>
In [104]: D = sparse.diags([0,1,0], dtype=M.dtype)
In [105]: D
Out[105]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 3 stored elements (1 diagonals) in DIAgonal format>
In [106]: D.A
Out[106]: 
array([[0, 0, 0],
       [0, 1, 0],
       [0, 0, 0]])
In [107]: M1 = M*D
In [108]: M1
Out[108]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 1 stored elements in Compressed Sparse Row format>
In [110]: M1.A
Out[110]: 
array([[0, 0, 0],
       [0, 0, 0],
       [0, 5, 0]], dtype=int64)

如果就地乘以矩阵，则不会收到效率警告。它只是更改现有非零项的值，因此不更改矩阵的稀疏度（至少直到消除零为止）：

In [111]: M = sparse.csr_matrix((data, indices, indptr), shape=(3, 3))
In [112]: M[:,[0,2]] *= 0
In [113]: M
Out[113]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 6 stored elements in Compressed Sparse Row format>
In [114]: M.eliminate_zeros()
In [115]: M
Out[115]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 1 stored elements in Compressed Sparse Row format>

将csr_matrix中的几列归零

1 个答案: