Question

我正在对文本数据运行分类模型。我使用了countvectorizer为模型创建功能。培训之后，我尝试了预测新实例。但是，我不断收到尺寸不匹配错误。我知道这是因为新实例不具有训练数据具有的所有功能。我仍然不确定如何解决此问题。下面是我的代码：

from sklearn.naive_bayes import MultinomialNB

x = data['text']
y = data['class']

# Transform data
cv_transformer = CountVectorizer()
Encoder = LabelEncoder()

x = cv_transformer.fit_transform(x)
y = Encoder.fit_transform(y)

# Split data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4, 
                                                    random_state=1) 


Naive = MultinomialNB()
Naive.fit(x_train,y_train)
# predict the labels on validation dataset
predictions_NB = Naive.predict(x_test)
# Use accuracy_score function to get the accuracy
print("Naive Bayes Accuracy Score -> ",accuracy_score(predictions_NB, y_test)*100)

# Testing a new instance

sample = ['my name is john doe']
sample = cv_transformer.transform(sample)

Naive.predict(sample)

最后一行导致错误弹出。关于如何调整尺寸的想法吗？

错误消息如下：

〜\ Anaconda3 \ lib \ site-packages \ scipy \ sparse \ base.py在 mul （自己，其他）

if other.shape[0] != self.shape[1]:
raise ValueError('dimension mismatch')

result = self._mul_multivector(np.asarray(other))

ValueError：尺寸不匹配

值错误：尺寸不匹配-Python分类

0 个答案: