ValueError:使用序列设置数组元素。 Scikit学习

时间:2015-06-23 12:37:07

标签: python ubuntu scikit-learn gensim

我正在尝试运行完全相同的代码,一次是在我的macbook pro上,一次是在AWS的Ubuntu机器上。

我的代码看起来就像这样(它使用来自scikit的MultinomialNB()学习):

clf = MultinomialNB()
clf.fit(vectorized_data, labels)

在我的Macbook模型上训练顺利,但在Ubuntu机器上我得到了:

<ipython-input-5-c52751e2119e> in <module>()
----> 1 m.train_models()

/home/ubuntu/topic_modeling/classification.pyc in train_models(self, minimal)
133                 continue
134             bm = BinaryModel(label)
--> 135         bm.train_models(self.vectorizer, self.data)
136             self.models.append(bm)
137             logger.info("Successfully trained model for the %s tag", label)

/home/ubuntu/topic_modeling/classification.pyc in train_models(self, vectorizer, data)
 92             # TODO some more complex grid search should be here
 93             clf = MultinomialNB()
---> 94         clf.fit(vectorized_data, labels)
 95             self.models.append(clf)
 96
/home/ubuntu/.virtualenvs/topics/local/lib/python2.7/site-packages/sklearn/naive_bayes.pyc in fit(self, X, y, sample_weight)
472             Returns self.
473         """
--> 474     X, y = check_X_y(X, y, 'csr')
475         _, n_features = X.shape
476

/home/ubuntu/.virtualenvs/topics/local/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric)
442     X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
443                     ensure_2d, allow_nd, ensure_min_samples,
--> 444                 ensure_min_features)
445     if multi_output:
446         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

/home/ubuntu/.virtualenvs/topics/local/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features)
342             else:
343                 dtype = None
--> 344     array = np.array(array, dtype=dtype, order=order, copy=copy)
345         # make sure we actually converted to numeric:
346         if dtype_numeric and array.dtype.kind == "O":

ValueError: setting an array element with a sequence.

我想在Ubuntu机器上运行培训,以便能够在屏幕上运行它。

当我尝试pip freeze而不是两台机器时,它看起来完全一样。 有没有人知道什么可能是错的?

修改

labels只是0和1的列表,例如。 [0, 1, 0, 0, 0, 1]

vectorized_data是使用gensim框架获得的。首先将文本标记化,然后将其转换为弓形:

bow_text = self.dictionary.doc2bow(tokenized_text)
self.tfidf = models.TfidfModel(dictionary=self.dictionary)
gensim.matutils.sparse2full(self.tfidf[bow_text], self.tfidf.num_nnz)

0 个答案:

没有答案