从scipy矩阵中删除行

时间:2017-10-23 01:56:26

标签: python numpy scipy

我有一个scipy稀疏矩阵data和一个整数n,它与我要删除的data中的一行相符合。要删除此行,我尝试了这个:

data = sparse.csr_matrix(np.delete(np.array(data),n, axis=0))

然而,这产生了这个错误:

Traceback (most recent call last):
  File "...", line 260, in <module>
    X_labeled = sparse.csr_matrix(np.delete(np.array(X_labeled),n, axis=0))
  File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/compressed.py", line 79, in __init__
    self._set_self(self.__class__(coo_matrix(arg1, dtype=dtype)))
  File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/coo.py", line 177, in __init__
    self.row, self.col = M.nonzero()
SystemError: <built-in method nonzero of numpy.ndarray object at 0x113c883f0> returned a result with an error set

当我跑步时:

data = np.delete(data.toarray(),n, axis=0)

我收到此错误:

Traceback (most recent call last):
  File "...", line 261, in <module>
    X_labeled = np.delete(X_labeled.toarray(),n, axis=0)
  File "/anaconda3/lib/python3.6/site-packages/numpy/lib/function_base.py", line 4839, in delete
    "size %i" % (obj, axis, N))
IndexError: index 86 is out of bounds for axis 0 with size 4

当我运行时:

print(type(data))
print(data.shape)
print(data.toarray().shape)

我明白了:

<class 'scipy.sparse.csr.csr_matrix'>
(4, 2740)
(4, 2740)

1 个答案:

答案 0 :(得分:2)

将稀疏矩阵转换为密集矩阵的正确方法是toarray,而不是np.array(...)

In [408]: M = sparse.csr_matrix(np.eye(3))
In [409]: M
Out[409]: 
<3x3 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in Compressed Sparse Row format>
In [410]: np.array(M)
Out[410]: 
array(<3x3 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in Compressed Sparse Row format>, dtype=object)

这是一个包含稀疏矩阵的单个元素对象dtype数组 - 未更改。

In [411]: M.toarray()
Out[411]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

delete适用于这个正确的数组:

In [414]: data = sparse.csr_matrix(np.delete(M.toarray(),1, axis=0))
In [415]: data
Out[415]: 
<2x3 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>
In [416]: data.A
Out[416]: 
array([[ 1.,  0.,  0.],
       [ 0.,  0.,  1.]])

索引也会做同样的事情:

In [417]: M[[0,2],:]
Out[417]: 
<2x3 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>
In [418]: _.A
Out[418]: 
array([[ 1.,  0.,  0.],
       [ 0.,  0.,  1.]])
In [420]: M[np.array([True,False,True]),:].A
Out[420]: 
array([[ 1.,  0.,  0.],
       [ 0.,  0.,  1.]])

我猜测索引路由更快,但我们必须对实际大小的数组进行时间测试才能确定。

内部delete相当复杂,但对于某些输入,它会执行类似的操作 - 为要删除的行构建一个带False的布尔数组。

制作布尔掩码:

In [421]: mask=np.ones((3,),bool)
In [422]: mask[1]=False
In [423]: M[mask,:].A