' mean_test_score'是什么意思?在cv_result中?

时间:2017-07-06 11:25:36

标签: python scikit-learn grid-search

您好我正在使用GridSearchCV,并且我使用.cv_results_中的scikit learn函数打印结果。

我的问题是,当我手动评估所有测试分数分数的均值时,我得到的数字与'mean_test_score'中的数字相比不同。哪个与标准np.mean()不同?

我在这里附上结果的代码:

n_estimators = [100]
max_depth = [3]
learning_rate = [0.1]

param_grid = dict(max_depth=max_depth, n_estimators=n_estimators, learning_rate=learning_rate)

gkf = GroupKFold(n_splits=7)


grid_search = GridSearchCV(model, param_grid, scoring=score_auc, cv=gkf)
grid_result = grid_search.fit(X, Y, groups=patients)

grid_result.cv_results_

此操作的结果是:

{'mean_fit_time': array([ 8.92773601]),
 'mean_score_time': array([ 0.04288721]),
 'mean_test_score': array([ 0.83490629]),
 'mean_train_score': array([ 0.95167036]),
 'param_learning_rate': masked_array(data = [0.1],
              mask = [False],
        fill_value = ?),
 'param_max_depth': masked_array(data = [3],
              mask = [False],
        fill_value = ?),
 'param_n_estimators': masked_array(data = [100],
              mask = [False],
        fill_value = ?),
 'params': ({'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100},),
 'rank_test_score': array([1]),
 'split0_test_score': array([ 0.74821666]),
 'split0_train_score': array([ 0.97564995]),
 'split1_test_score': array([ 0.80089016]),
 'split1_train_score': array([ 0.95361201]),
 'split2_test_score': array([ 0.92876979]),
 'split2_train_score': array([ 0.93935856]),
 'split3_test_score': array([ 0.95540287]),
 'split3_train_score': array([ 0.94718634]),
 'split4_test_score': array([ 0.89083901]),
 'split4_train_score': array([ 0.94787374]),
 'split5_test_score': array([ 0.90926355]),
 'split5_train_score': array([ 0.94829775]),
 'split6_test_score': array([ 0.82520379]),
 'split6_train_score': array([ 0.94971417]),
 'std_fit_time': array([ 1.79167576]),
 'std_score_time': array([ 0.02970254]),
 'std_test_score': array([ 0.0809713]),
 'std_train_score': array([ 0.0105566])}

正如您所看到的,在执行所有test_score的np.mean时,它会为您提供大约0.8655122606479532的值,而' mean_test_score'是0.83490629

谢谢你的帮助, 莱昂纳多。

3 个答案:

答案 0 :(得分:3)

如果您在其github存储库中看到GridSearchCV的原始代码,则他们不使用np.mean(),而是使用权重np.average()。因此差异。这是他们的代码:

n_splits = 3
test_sample_counts = np.array(test_sample_counts[:n_splits],
                                    dtype=np.int)
weights = test_sample_counts if self.iid else None
means = np.average(test_scores, axis=1, weights=weights)
stds = np.sqrt(np.average((test_scores - means[:, np.newaxis]) 
                               axis=1, weights=weights))

 cv_results = dict()
 for split_i in range(n_splits):
        cv_results["split%d_test_score" % split_i] = test_scores[:,
                                                              split_i]
 cv_results["mean_test_score"] = means        
 cv_results["std_test_score"] = stds

如果您想了解更多关于它们之间的区别,请查看 Difference between np.mean() and np.average()

答案 1 :(得分:2)

我会将此作为一个新的答案发布,因为它有如此多的代码:

折叠的测试和训练分数是:(取自您在问题中发布的结果)

test_scores = [0.74821666,0.80089016,0.92876979,0.95540287,0.89083901,0.90926355,0.82520379]
train_scores = [0.97564995,0.95361201,0.93935856,0.94718634,0.94787374,0.94829775,0.94971417]

这些折叠中的训练样本数量为:(取自print([(len(train), len(test)) for train, test in gkf.split(X, groups=patients)])的输出)

train_len = [41835, 56229, 56581, 58759, 60893, 60919, 62056]
test_len = [24377, 9983, 9631, 7453, 5319, 5293, 4156]

然后,测试和训练装置每次折叠的训练样本量为:

train_avg = np.average(train_scores, weights=train_len)
-> 0.95064898361714389
test_avg = np.average(test_scores, weights=test_len)
-> 0.83490628649308296

所以这正是sklearn给你的价值。它也是您分类的正确平均准确度。折叠的平均值是不正确的,因为它取决于您选择的稍微任意的分割/折叠。

所以在结论中,两种解释确实完全相同和正确。

答案 2 :(得分:1)

我认为不同均值的原因是均值计算中的加权因子不同。

sklearn返回的mean_test_score是在每个样本具有相同权重的所有样本上计算的平均值。

如果通过取折叠的平均值(分割)计算平均值,那么只有折叠的大小相同时才能得到相同的结果。如果不是这样,那么所有较大褶皱的样本对褶皱的平均值的影响将小于较小的褶皱,反之亦然。

小数字示例:

mean([2,3,5,8,9]) = 5.4 # mean over all samples ('mean_test_score')

mean([2,3,5]) = 3.333 # mean of fold 1
mean([8,9]) = 8.5 # mean of fold 2

mean(3.333, 8.5) = 5.91 # mean of means of folds

5.4 != 5.91