ML:多项NB错误:ValueError:用序列设置数组元素

时间:2018-04-28 01:40:52

标签: pandas tensorflow machine-learning scikit-learn multinomial

我对机器学习很陌生,我正在尝试对公共葡萄酒数据集进行实验。 我最终得到一个错误,我找不到解决方案。

以下是我正在尝试使用的模型:

X = data_all[['country', 'description', 'price', 'province', 'variety']]
y = data_all['points']

# Vectorizing Description column (text analysis)
vectorizerDesc = CountVectorizer()
descriptions = X['description']
vectorizerDesc.fit(descriptions)
vectorizedDesc = vectorizer.transform(X['description'])
X['description'] = vectorizedDesc

# Categorizing other string columns
X = pd.get_dummies(X, columns=['country', 'province', 'variety'])

# Generating train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

# Multinomial Naive Bayes
nb = MultinomialNB()
nb.fit(X_train, y_train)

这是在调用train_test_split之前X看起来的样子:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 83945 entries, 25 to 150929
Columns: 837 entries, description to variety_Zweigelt
dtypes: float64(1), object(1), uint8(835)

最后一行(nb.fit)给了我一个错误:

ValueError                                Traceback (most recent call last)
<ipython-input-197-9d40e4624ff6> in <module>()
      3 # Multinomial Naive Bayes is a specialised version of Naive Bayes designed more for text documents
      4 nb = MultinomialNB()
----> 5 nb.fit(X_train, y_train)

/opt/conda/lib/python3.6/site-packages/sklearn/naive_bayes.py in fit(self, X, y, sample_weight)
    577             Returns self.
    578         """
--> 579         X, y = check_X_y(X, y, 'csr')
    580         _, n_features = X.shape
    581 

/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    571     X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
    572                     ensure_2d, allow_nd, ensure_min_samples,
--> 573                     ensure_min_features, warn_on_dtype, estimator)
    574     if multi_output:
    575         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    446         # make sure we actually converted to numeric:
    447         if dtype_numeric and array.dtype.kind == "O":
--> 448             array = array.astype(np.float64)
    449         if not allow_nd and array.ndim >= 3:
    450             raise ValueError("Found array with dim %d. %s expected <= 2."

ValueError: setting an array element with a sequence.

您是否知道如何在多项NB算法中将我的矢量化文本分析和其他数据集(如国家等...)结合起来?

提前谢谢你:)

0 个答案:

没有答案