使用cross_val保存sklearn分类器

时间:2016-12-21 16:56:35

标签: python machine-learning scikit-learn

我有一个分类器,我正在使用cross_val并获得良好的结果。基本上我所做的只是:

clf = RandomForestClassifier(class_weight="balanced")
scores = cross_val_score(clf, data, target, cv=8)
predict_RF = cross_val_predict(clf, data, target, cv=8)

from sklearn.externals import joblib
joblib.dump(clf, 'churnModel.pkl')

基本上我想做的是采用cross_val拟合的模型并导出到joblib。但是,当我尝试在一个单独的项目中将其拉入时,我得到:

sklearn.exceptions.NotFittedError: Estimator not fitted, call `fit` before exploiting the model.

所以我猜测cross_val实际上并没有保存我的clf的适合度?如何保持cross_val生成的模型拟合?

1 个答案:

答案 0 :(得分:1)

juanpa.arrivillaga是对的。我担心你必须手动完成它,但scikit-learn使它变得非常容易。 cross_val_score创建未返回给您的训练模型。下面你将列出训练模型(即clf_models)

from sklearn.model_selection import StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
from copy import deepcopy

kf = StratifiedKFold(n_splits=8)
clf = RandomForestClassifier(class_weight="balanced")
clf_models = []

# keep in mind your X and y should be indexed same here
kf.get_n_splits(X_data)
for train_index, test_index in kf.split(X_data):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X_data[train_index], X_data[test_index]
    y_train, y_test = y_data[train_index], y_data[test_index]
    tmp_clf = deepcopy(clf)
    tmp_clf.fit(X_train, y_train)

    print("Got a score of {}".format(tmp_clf.score(X_test, y_test))
    clf_models.append(tmp_clf)

-edit来自juanpa.arrivillaga的建议 StratifiedKFold是更好的选择。仅供演示。