我有一个列表X_train
(> 20000个元素),每个元素都是由csr_matrix
创建的稀疏scipy HashingVectorizer.transform()
。
我的HashingVectorizer.transform()
对输入文件进行逐行转换,并将其附加到列表X_train。
我正在尝试使用X_train训练SGDClassifier
但我收到错误:
ValueError: setting an array element with a sequence
。
如何在不进行CPU或内存密集型操作的情况下训练SGDClassifier?
答案 0 :(得分:0)
稀疏矩阵列表,以及将其转换为数组或稀疏矩阵(或不是)的方法:
In [916]: alist=[sparse.random(1,10,.2, format='csr') for _ in range(3)]
In [917]: alist
Out[917]:
[<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>,
<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>,
<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>]
制作一个合适的稀疏矩阵(2d):
In [918]: sparse.vstack(alist)
Out[918]:
<3x10 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>
矩阵的对象数组 - 坏
In [919]: np.array(alist)
Out[919]:
array([ <1x10 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>,
<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>,
<1x10 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>], dtype=object)
尝试制作一个浮动数组 - 你的错误
In [920]: np.array(alist, float)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-920-52d4689fa7b3> in <module>()
----> 1 np.array(alist, float)
ValueError: setting an array element with a sequence.