Question

我正在使用sci-kitlearn和pickle（以保存经过训练的模型）。

首先，执行以下代码：

from sklearn.linear_model import LogisticRegression

logreg = LogisticRegression(solver='lbfgs', multi_class='auto')
logreg.fit(X_train, y_train)

with open('text_classifier', 'wb') as picklefile:
    pickle.dump(logreg, picklefile)

当我以后想要再次使用此模型时，我会使用（以检查它是否仍然有效）：

with open('text_classifier', 'rb') as training_model:
    model = pickle.load(training_model)

print('Accuracy of Logistic regression classifier on test set: {:.2f}\n'
      .format(model.score(X_test, y_test)))

但是，这将引发以下错误：

ValueError: X has 74 features per sample; expecting 77

有人可以向我解释为什么会这样吗？

Answer 1

您确定训练和测试中的列相同吗？显然，您的X_test集有74列，而模型预期有77列。您是否在训练之前对数据进行了采样？可能是因为采样不正确。

如果您的数据集是完整的，那么应该可以正常工作：

pickle.dump(logreg, open('text_classifier.sav', 'wb'))
model = pickle.load(open('text_classifier.sav', 'rb'))

这可能会有所帮助：有关模型持久性（棘手和Joblib）的Scikit学习文档： https://scikit-learn.org/stable/modules/model_persistence.html

加载的模型具有错误的功能数量

1 个答案: