我正在尝试运行完全相同的代码,一次是在我的macbook pro上,一次是在AWS的Ubuntu机器上。
我的代码看起来就像这样(它使用来自scikit的MultinomialNB()学习):
clf = MultinomialNB()
clf.fit(vectorized_data, labels)
在我的Macbook模型上训练顺利,但在Ubuntu机器上我得到了:
<ipython-input-5-c52751e2119e> in <module>()
----> 1 m.train_models()
/home/ubuntu/topic_modeling/classification.pyc in train_models(self, minimal)
133 continue
134 bm = BinaryModel(label)
--> 135 bm.train_models(self.vectorizer, self.data)
136 self.models.append(bm)
137 logger.info("Successfully trained model for the %s tag", label)
/home/ubuntu/topic_modeling/classification.pyc in train_models(self, vectorizer, data)
92 # TODO some more complex grid search should be here
93 clf = MultinomialNB()
---> 94 clf.fit(vectorized_data, labels)
95 self.models.append(clf)
96
/home/ubuntu/.virtualenvs/topics/local/lib/python2.7/site-packages/sklearn/naive_bayes.pyc in fit(self, X, y, sample_weight)
472 Returns self.
473 """
--> 474 X, y = check_X_y(X, y, 'csr')
475 _, n_features = X.shape
476
/home/ubuntu/.virtualenvs/topics/local/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric)
442 X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
443 ensure_2d, allow_nd, ensure_min_samples,
--> 444 ensure_min_features)
445 if multi_output:
446 y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,
/home/ubuntu/.virtualenvs/topics/local/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features)
342 else:
343 dtype = None
--> 344 array = np.array(array, dtype=dtype, order=order, copy=copy)
345 # make sure we actually converted to numeric:
346 if dtype_numeric and array.dtype.kind == "O":
ValueError: setting an array element with a sequence.
我想在Ubuntu机器上运行培训,以便能够在屏幕上运行它。
当我尝试pip freeze
而不是两台机器时,它看起来完全一样。
有没有人知道什么可能是错的?
修改
labels
只是0和1的列表,例如。 [0, 1, 0, 0, 0, 1]
vectorized_data
是使用gensim框架获得的。首先将文本标记化,然后将其转换为弓形:
bow_text = self.dictionary.doc2bow(tokenized_text)
self.tfidf = models.TfidfModel(dictionary=self.dictionary)
gensim.matutils.sparse2full(self.tfidf[bow_text], self.tfidf.num_nnz)