我正在对文本数据运行分类模型。我使用了countvectorizer为模型创建功能。培训之后,我尝试了预测新实例。但是,我不断收到尺寸不匹配错误。我知道这是因为新实例不具有训练数据具有的所有功能。我仍然不确定如何解决此问题。下面是我的代码:
from sklearn.naive_bayes import MultinomialNB
x = data['text']
y = data['class']
# Transform data
cv_transformer = CountVectorizer()
Encoder = LabelEncoder()
x = cv_transformer.fit_transform(x)
y = Encoder.fit_transform(y)
# Split data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4,
random_state=1)
Naive = MultinomialNB()
Naive.fit(x_train,y_train)
# predict the labels on validation dataset
predictions_NB = Naive.predict(x_test)
# Use accuracy_score function to get the accuracy
print("Naive Bayes Accuracy Score -> ",accuracy_score(predictions_NB, y_test)*100)
# Testing a new instance
sample = ['my name is john doe']
sample = cv_transformer.transform(sample)
Naive.predict(sample)
最后一行导致错误弹出。关于如何调整尺寸的想法吗?
错误消息如下:
〜\ Anaconda3 \ lib \ site-packages \ scipy \ sparse \ base.py在 mul (自己,其他)
if other.shape[0] != self.shape[1]:
raise ValueError('dimension mismatch')
result = self._mul_multivector(np.asarray(other))
ValueError:尺寸不匹配