我对Text Mining和Python sklearn有点新意,所以希望你能帮我解决这个问题。
一旦我使用了使用Python sklearn的矢量化器和特征选择器,我就训练了一个AdaBoost分类器,增强几个分类器。随后,基于AdaBoost分类器进行的预测来计算准确度分数。给我一个关于精度,召回,F1和准确度分数的精彩概述。但是,一旦我使用sklearn joblib转储模型并再次加载它以检查它是否可以预测二进制文本分类问题,则会给出以下错误:
ValueError: X has different number of features than during model fitting
浏览包含多个矢量化器,分类器和特征选择器的元组列表,然后将度量和向量化程序,分类器和特征选择器附加到单独的列表中。我的部分代码如下:
for (vectorizer,vec_Name),(classifier,classifier_Name),(fselector,fselector_Name) in list(itertools.product(vectorizers,binary_classifiers,featureSelectors)):
trainText = vectorizer.fit_transform(trainText_b)
testText = vectorizer.transform(testText_b)
trainText = fselector.fit_transform(trainText,trainClass_b)
testText = fselector.transform(testText)
classifier = AdaBoostClassifier(base_estimator=classifier, n_estimators=50, algorithm='SAMME')
cls = classifier.fit(trainText,trainClass_b)
prediction = cls.predict(testText)
precision = precision_score(testClass_b, prediction,average='binary')
recall = recall_score(testClass_b, prediction,average='binary')
F1 = f1_score(testClass_b, prediction,average='binary')
accuracyScore = accuracy_score(testClass_b, prediction)
results_b.append((vec_Name,classifier_Name,fselector_Name,precision,recall,F1,accuracyScore))
best_class_b.append((vectorizer,classifier,fselector,accuracyScore))
如前所述,它仍会计算多个评分指标,但无法用于预测某个类别。这是否意味着与AdaBoost分类结合使用特征选择是不可行的?
提前致谢。