Question

这是一个非常基本的概念：我对培训有多个依赖。我的数据都是文本，我有三个单独的字段。我能找到的每个例子都有这样的文本数据设置：

data = ['text1','text2',...]

我的看起来像：

data = [['text1','text2','text3'],[...],...]

但是当我尝试适应数据时，我得到以下追溯：

ValueError                                Traceback (most recent call last)
<ipython-input-25-e3356a0f62f8> in <module>()
----> 1 classifier.fit(X,y)

/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/svm/base.pyc in fit(self, X, y, sample_weight)
    140                              "by not using the ``sparse`` parameter")
    141 
--> 142         X = atleast2d_or_csr(X, dtype=np.float64, order='C')
    143 
    144         if self.impl in ['c_svc', 'nu_svc']:

/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/utils/validation.pyc in atleast2d_or_csr(X, dtype, order, copy)
    114     """
    115     return _atleast2d_or_sparse(X, dtype, order, copy, sparse.csr_matrix,
--> 116                                 "tocsr")
    117 
    118 

/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/utils/validation.pyc in _atleast2d_or_sparse(X, dtype, order, copy, sparse_class, convmethod)
     94         _assert_all_finite(X.data)
     95     else:
---> 96         X = array2d(X, dtype=dtype, order=order, copy=copy)
     97         _assert_all_finite(X)
     98     return X

/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/utils/validation.pyc in array2d(X, dtype, order, copy)
     78         raise TypeError('A sparse matrix was passed, but dense data '
     79                         'is required. Use X.toarray() to convert to dense.')
---> 80     X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
     81     _assert_all_finite(X_2d)
     82     if X is X_2d and copy:

/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
    318 
    319     """
--> 320     return array(a, dtype, copy=False, order=order)
    321 
    322 def asanyarray(a, dtype=None, order=None):

ValueError: setting an array element with a sequence.

有什么具体方法我必须接近这个吗？谢谢！

注意：

我使用的所有文本数据都由HashingVectorizer

进行矢量化

clf.fit(X,y)其中X是包含3个矢量化文本的列表列表，y是X元素所属的各个类别的列表< / p>

Answer 1

X必须是二维数组（或列表列表，如果需要）。此列表列表中的每个列表都必须是数值列表。所有这些列表必须具有相同的长度。像这样：[[1,2,3,5]，[3,4,5,6]，[6,7,8,9]，......]。如果对于每个对象，您有几个要进行矢量化的文本条目，则需要将得到的矢量化文本合并到一个列表中。例如，如果它在您的上下文中有意义，则将它们连接起来。因此，最终每个对象必须由单个列表表示，其中所有条目都是数字。并且所有对象必须由相等长度的列表表示，其中所有列表中的对应元素表示相同的特征（例如，文本中相同令牌的频率）。让我知道我所说的是否合理。

使用scikit-learn训练多维数据

1 个答案: