使用csr_matrix列表来训练SGDClassifier

时间:2017-07-22 01:57:48

标签: python numpy machine-learning scipy

我有一个列表X_train(> 20000个元素),每个元素都是由csr_matrix创建的稀疏scipy HashingVectorizer.transform()

我的HashingVectorizer.transform()对输入文件进行逐行转换,并将其附加到列表X_train。

我正在尝试使用X_train训练SGDClassifier但我收到错误:

ValueError: setting an array element with a sequence

如何在不进行CPU或内存密集型操作的情况下训练SGDClassifier?

1 个答案:

答案 0 :(得分:0)

稀疏矩阵列表,以及将其转换为数组或稀疏矩阵(或不是)的方法:

In [916]: alist=[sparse.random(1,10,.2, format='csr') for _ in range(3)]
In [917]: alist
Out[917]: 
[<1x10 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>,
 <1x10 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>,
 <1x10 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>]

制作一个合适的稀疏矩阵(2d):

In [918]: sparse.vstack(alist)
Out[918]: 
<3x10 sparse matrix of type '<class 'numpy.float64'>'
    with 6 stored elements in Compressed Sparse Row format>

矩阵的对象数组 - 坏

In [919]: np.array(alist)
Out[919]: 
array([ <1x10 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>,
       <1x10 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>,
       <1x10 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>], dtype=object)

尝试制作一个浮动数组 - 你的错误

In [920]: np.array(alist, float)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-920-52d4689fa7b3> in <module>()
----> 1 np.array(alist, float)

ValueError: setting an array element with a sequence.