ValueError:预期的2D数组,在拟合模型时获得了1D数组

时间:2019-11-06 15:58:42

标签: python machine-learning scikit-learn

我正在尝试基于yellowbrick的load_hobbies数据集建立模型;将数据分为训练和测试数据集后,我编写了以下代码以适合模型。但是,我遇到了ValueError错误:预期2D数组,而是1D数组。如果数据具有单个功能,则使用array.reshape(-1,1)重整数据;如果包含单个样本,则使用array.reshape(1,-1)重整数据。 我不知道为什么有人可以帮忙吗?这是代码:

from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split as tts


#corpus = load_hobbies()
#X = TfidfVectorizer().fit_transform(corpus.data)
#y = LabelEncoder().fit_transform(corpus.target)
#
#X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2)
#
#model = MultinomialNB().fit(X_train, y_train)
#model.score(X_test, y_test)

corpus = load_hobbies()
X = corpus.data
y = corpus.target
#
X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2)
#
model = GaussianNB()
model.fit(X_train, y_train)```




1 个答案:

答案 0 :(得分:0)

尽管您已导入TfidfVectorizer,但它似乎并没有像使用过的样子。

X = corpus.data返回一个列表,其中所有文档内容均为字符串。您需要使用TfidfVectorizer将此原始文档集合转换为矩阵。

您还需要使用X.toarray()将此稀疏矩阵转换为密集矩阵。

执行此操作后,您应该能够正确拟合模型并使用Yellowbrick进行可视化。

例如:

import numpy as np

from yellowbrick.datasets import load_hobbies
from yellowbrick.classifier import ClassificationReport

from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split as tts

# Load the data and create document vectors
corpus = load_hobbies()
tfidf = TfidfVectorizer()

X = tfidf.fit_transform(corpus.data)
y = corpus.target

# Turn sparse matrix into dense matrix
X = X.toarray()

# Split data into training and testing
X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2, random_state=42)

# Instantiate the classification model and visualizer
model = GaussianNB()
visualizer = ClassificationReport(model, support=True)

visualizer.fit(X_train, y_train)        # Fit the visualizer and the model
visualizer.score(X_test, y_test)        # Evaluate the model on the test data
visualizer.show()                       # Finalize and show the figure

Yellowbrick Classification Report with Corpus