我使用以下链接创建了文本分类器模型:https://stackabuse.com/text-classification-with-python-and-scikit-learn/ 然后,我尝试对其进行检查并与自己的数据(不是数据集)一起使用。但是,它说功能数量不匹配。 这是我的代码:
import pickle
with open('text_classifier', 'rb') as training_model:
model = pickle.load(training_model)
f = open(r".\descriptions_dataset\image\0.txt", "r")
test = f.read()
print(test)
f.close()
import re
from nltk.stem import WordNetLemmatizer
stemmer = WordNetLemmatizer()
document = re.sub(r'\W', ' ', str(test))
document = re.sub(r'\s+[a-zA-Z]\s+', ' ', document)
document = re.sub(r'\^[a-zA-Z]\s+', ' ', document)
document = re.sub(r'\s+', ' ', document, flags=re.I)
document = re.sub(r'^b\s+', '', document)
document = document.lower()
document = document.split()
document = [stemmer.lemmatize(word) for word in document]
print(document)
nltk.download('stopwords')
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(max_features=1500, min_df=1, max_df=0.7, stop_words=stopwords.words('english'))
X = vectorizer.fit_transform(document).toarray()
from sklearn.feature_extraction.text import TfidfTransformer
tfidfconverter = TfidfTransformer()
X = tfidfconverter.fit_transform(X).toarray()
y_pred = model.predict(X)
ValueError跟踪(最近一次通话最近) 在 ----> 1 y_pred = model.predict(X)
〜\ Anaconda3 \ lib \ site-packages \ sklearn \ ensemble \ forest.py在预报中(self,X) 541预测的类别。 542“”“ -> 543 proba = self.predict_proba(X) 544 第545章真相(1)
〜\\ Anaconda3 \ lib \ site-packages \ sklearn \ ensemble \ forest.py在Forecast_proba中(self,X) 581 check_is_fitted(self,'estimators_') 582#检查数据 -> 583 X = self._validate_X_predict(X) 584 585#为作业分配树木
〜\ Anaconda3 \ lib \ site-packages \ sklearn \ ensemble \ forest.py in _validate_X_predict(self,X)
360“在利用模型之前调用fit
。”)
361
-> 362返回self.estimators_ [0] ._ validate_X_predict(X,check_input = True)
363
364 @属性
〜\ Anaconda3 \ lib \ site-packages \ sklearn \ tree \ tree.py在_validate_X_predict(self,X,check_input)中 386“匹配输入。模型n_features是%s和” 387“输入n_features是%s” -> 388%(self.n_features_,n_features)) 389 390返回X
ValueError:模型的特征数量必须与输入匹配。模型n_features是1500,输入n_features是86