Question

我有一个分类器，我正在使用cross_val并获得良好的结果。基本上我所做的只是：

clf = RandomForestClassifier(class_weight="balanced")
scores = cross_val_score(clf, data, target, cv=8)
predict_RF = cross_val_predict(clf, data, target, cv=8)

from sklearn.externals import joblib
joblib.dump(clf, 'churnModel.pkl')

基本上我想做的是采用cross_val拟合的模型并导出到joblib。但是，当我尝试在一个单独的项目中将其拉入时，我得到：

sklearn.exceptions.NotFittedError: Estimator not fitted, call `fit` before exploiting the model.

所以我猜测cross_val实际上并没有保存我的clf的适合度？如何保持cross_val生成的模型拟合？

Answer 1

juanpa.arrivillaga是对的。我担心你必须手动完成它，但scikit-learn使它变得非常容易。 cross_val_score创建未返回给您的训练模型。下面你将列出训练模型（即clf_models）

from sklearn.model_selection import StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
from copy import deepcopy

kf = StratifiedKFold(n_splits=8)
clf = RandomForestClassifier(class_weight="balanced")
clf_models = []

# keep in mind your X and y should be indexed same here
kf.get_n_splits(X_data)
for train_index, test_index in kf.split(X_data):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X_data[train_index], X_data[test_index]
    y_train, y_test = y_data[train_index], y_data[test_index]
    tmp_clf = deepcopy(clf)
    tmp_clf.fit(X_train, y_train)

    print("Got a score of {}".format(tmp_clf.score(X_test, y_test))
    clf_models.append(tmp_clf)

-edit来自juanpa.arrivillaga的建议 StratifiedKFold是更好的选择。仅供演示。

使用cross_val保存sklearn分类器

1 个答案: