为什么参数搜索和cross_val_score的得分不同?

时间:2016-05-20 10:58:03

标签: python scikit-learn

我正在使用RandomizedSearchCV调整随机森林的超参数。获得良好的参数集后,我使用cross_validation.cross_val_score评估模型。

我注意到RandomizedSearchCV的分数与cross_validation.cross_val_score的分数略有不同。 cross_val_score的得分总是优于RandomizedSearchCV

from scipy.stats import randint as sp_randint
from sklearn.grid_search import RandomizedSearchCV
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier
from operator import itemgetter

def report(grid_scores, n_top=3, reverse=True):
    top_scores = sorted(grid_scores, key=itemgetter(1), reverse=reverse)[:n_top]
    for i, score in enumerate(top_scores):
        print("Model with rank: {0}".format(i + 1))
        print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
              score.mean_validation_score,
              np.std(score.cv_validation_scores)))
        print("Parameters: {0}".format(score.parameters))
        print("")

# get x and y
digits = load_digits()
X, y = digits.data, digits.target

# find good parameter set
param_dist = {'n_estimators': sp_randint(500, 2000)}
n_iter = 100
clf = RandomForestClassifier()
random_search_clf = RandomizedSearchCV(clf, param_distributions=param_dist, n_iter=n_iter_search, scoring="f1", n_jobs=2)
random_search_clf.fit(X, y)
report(random_search_clf.grid_scores_, 1, reverse=True) # print the score of top estimator

# evaluate with cross_validation
_param = random_search_clf.best_estimator_.get_params()
clf = RandomForestClassifier(**_param)
scores = cross_validation.cross_val_score(clf, X, y, cv=3, scoring=kappa_scorer, n_jobs=2, fit_params=fit_params)
print scores # score by cross_val_score

我的问题是为什么会发生这种情况以及哪些分数值得信赖。在上面的代码中,为什么report方法打印的分数与print scores不同?

0 个答案:

没有答案