我有一个分类器,我正在使用cross_val并获得良好的结果。基本上我所做的只是:
clf = RandomForestClassifier(class_weight="balanced")
scores = cross_val_score(clf, data, target, cv=8)
predict_RF = cross_val_predict(clf, data, target, cv=8)
from sklearn.externals import joblib
joblib.dump(clf, 'churnModel.pkl')
基本上我想做的是采用cross_val拟合的模型并导出到joblib。但是,当我尝试在一个单独的项目中将其拉入时,我得到:
sklearn.exceptions.NotFittedError: Estimator not fitted, call `fit` before exploiting the model.
所以我猜测cross_val实际上并没有保存我的clf的适合度?如何保持cross_val生成的模型拟合?
答案 0 :(得分:1)
juanpa.arrivillaga是对的。我担心你必须手动完成它,但scikit-learn使它变得非常容易。 cross_val_score创建未返回给您的训练模型。下面你将列出训练模型(即clf_models)
from sklearn.model_selection import StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
from copy import deepcopy
kf = StratifiedKFold(n_splits=8)
clf = RandomForestClassifier(class_weight="balanced")
clf_models = []
# keep in mind your X and y should be indexed same here
kf.get_n_splits(X_data)
for train_index, test_index in kf.split(X_data):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X_data[train_index], X_data[test_index]
y_train, y_test = y_data[train_index], y_data[test_index]
tmp_clf = deepcopy(clf)
tmp_clf.fit(X_train, y_train)
print("Got a score of {}".format(tmp_clf.score(X_test, y_test))
clf_models.append(tmp_clf)
-edit来自juanpa.arrivillaga的建议 StratifiedKFold是更好的选择。仅供演示。