我有一个scipy稀疏矩阵data
和一个整数n
,它与我要删除的data
中的一行相符合。要删除此行,我尝试了这个:
data = sparse.csr_matrix(np.delete(np.array(data),n, axis=0))
然而,这产生了这个错误:
Traceback (most recent call last):
File "...", line 260, in <module>
X_labeled = sparse.csr_matrix(np.delete(np.array(X_labeled),n, axis=0))
File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/compressed.py", line 79, in __init__
self._set_self(self.__class__(coo_matrix(arg1, dtype=dtype)))
File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/coo.py", line 177, in __init__
self.row, self.col = M.nonzero()
SystemError: <built-in method nonzero of numpy.ndarray object at 0x113c883f0> returned a result with an error set
当我跑步时:
data = np.delete(data.toarray(),n, axis=0)
我收到此错误:
Traceback (most recent call last):
File "...", line 261, in <module>
X_labeled = np.delete(X_labeled.toarray(),n, axis=0)
File "/anaconda3/lib/python3.6/site-packages/numpy/lib/function_base.py", line 4839, in delete
"size %i" % (obj, axis, N))
IndexError: index 86 is out of bounds for axis 0 with size 4
当我运行时:
print(type(data))
print(data.shape)
print(data.toarray().shape)
我明白了:
<class 'scipy.sparse.csr.csr_matrix'>
(4, 2740)
(4, 2740)
答案 0 :(得分:2)
将稀疏矩阵转换为密集矩阵的正确方法是toarray
,而不是np.array(...)
:
In [408]: M = sparse.csr_matrix(np.eye(3))
In [409]: M
Out[409]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>
In [410]: np.array(M)
Out[410]:
array(<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>, dtype=object)
这是一个包含稀疏矩阵的单个元素对象dtype数组 - 未更改。
In [411]: M.toarray()
Out[411]:
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
delete
适用于这个正确的数组:
In [414]: data = sparse.csr_matrix(np.delete(M.toarray(),1, axis=0))
In [415]: data
Out[415]:
<2x3 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
In [416]: data.A
Out[416]:
array([[ 1., 0., 0.],
[ 0., 0., 1.]])
索引也会做同样的事情:
In [417]: M[[0,2],:]
Out[417]:
<2x3 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
In [418]: _.A
Out[418]:
array([[ 1., 0., 0.],
[ 0., 0., 1.]])
In [420]: M[np.array([True,False,True]),:].A
Out[420]:
array([[ 1., 0., 0.],
[ 0., 0., 1.]])
我猜测索引路由更快,但我们必须对实际大小的数组进行时间测试才能确定。
内部delete
相当复杂,但对于某些输入,它会执行类似的操作 - 为要删除的行构建一个带False
的布尔数组。
制作布尔掩码:
In [421]: mask=np.ones((3,),bool)
In [422]: mask[1]=False
In [423]: M[mask,:].A