我有一个scipy.sparse.csr.csr_matrix
尺寸的(8723, 1741277)
。
如何有效地按行将其分成n个块?
最好是块的行数大致相等。
我之所以这么说是因为这取决于(行数)/(块数)是否还清余数。
我认为您可以使用numpy.split
轻松完成数组操作,但似乎不适用于稀疏矩阵。
具体来说,如果我选择不能用8723整除的n块数,则会出现此错误:
ValueError: array split does not result in an equal division
如果我选择n块数(可以用8723整除),则会出现此错误:
AxisError: axis1: axis 0 is out of bounds for array of dimension 0
我想将稀疏矩阵拆分为多个块的原因是因为我想将稀疏矩阵转换为(密集)数组,但由于整体太大而无法直接执行。
答案 0 :(得分:0)
In [6]: from scipy import sparse
In [7]: M = sparse.random(12,3,.1,'csr')
In [8]: np.split?
In [9]: np.split(M,3)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
55 try:
---> 56 return getattr(obj, method)(*args, **kwds)
57
/usr/local/lib/python3.6/dist-packages/scipy/sparse/base.py in __getattr__(self, attr)
687 else:
--> 688 raise AttributeError(attr + " not found")
689
AttributeError: swapaxes not found
During handling of the above exception, another exception occurred:
AxisError Traceback (most recent call last)
<ipython-input-9-11a4dcdd89af> in <module>
----> 1 np.split(M,3)
/usr/local/lib/python3.6/dist-packages/numpy/lib/shape_base.py in split(ary, indices_or_sections, axis)
848 raise ValueError(
849 'array split does not result in an equal division')
--> 850 res = array_split(ary, indices_or_sections, axis)
851 return res
852
/usr/local/lib/python3.6/dist-packages/numpy/lib/shape_base.py in array_split(ary, indices_or_sections, axis)
760
761 sub_arys = []
--> 762 sary = _nx.swapaxes(ary, axis, 0)
763 for i in range(Nsections):
764 st = div_points[i]
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in swapaxes(a, axis1, axis2)
583
584 """
--> 585 return _wrapfunc(a, 'swapaxes', axis1, axis2)
586
587
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
64 # a downstream library like 'pandas'.
65 except (AttributeError, TypeError):
---> 66 return _wrapit(obj, method, *args, **kwds)
67
68
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in _wrapit(obj, method, *args, **kwds)
44 except AttributeError:
45 wrap = None
---> 46 result = getattr(asarray(obj), method)(*args, **kwds)
47 if wrap:
48 if not isinstance(result, mu.ndarray):
AxisError: axis1: axis 0 is out of bounds for array of dimension 0
如果将np.array
应用于M
,则会得到一个0d对象数组;只是围绕稀疏对象的幼稚包装。
In [10]: np.array(M)
Out[10]:
array(<12x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>, dtype=object)
In [11]: _.shape
Out[11]: ()
分割正确的密集等效项:
In [12]: np.split(M.A,3)
Out[12]:
[array([[0. , 0.61858517, 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ]]), array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]), array([[0. , 0.89573059, 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0.02334738],
[0. , 0. , 0. ]])]
和直接稀疏拆分:
In [13]: [M[i:j,:] for i,j in zip([0,4,8],[4,8,12])]
Out[13]:
[<4x3 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>,
<4x3 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in Compressed Sparse Row format>,
<4x3 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>]
在稀疏矩阵上,这种切片的效率不如在密集矩阵上。密集切片是视图。稀疏副本必须是副本。唯一的例外是lil
格式,该格式具有get_rowview
方法。尽管有很多功能可以从片段中构造稀疏矩阵,但是并不需要将它们分解的功能。
sklearn
可能具有某些拆分功能。它具有一些稀疏的实用程序功能,可满足其自身对稀疏矩阵的使用。