网格搜索分数返回什么值?

时间:2019-05-26 05:43:41

标签: python-3.x scikit-learn grid-search

我在评分设置为准确性的预测模型上运行了GridSearchCVXY是测试组。

from sklearn.model_selection import GridSearchCV 
from sklearn.metrics import classification_report
from sklearn.svm import SVC

tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                 'C': [10, 100, 1000]}]


print("Tuning hyperparameters for accuracy")

clf_gs = GridSearchCV(SVC(), tuned_parameters, cv=5,
               scoring = 'accuracy')
clf_gs.fit(X, Y)

print(clf_gs.best_params_)

print("Grid scores on development set:")

means = clf_gs.cv_results_['mean_test_score']
stds = clf_gs.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf_gs.cv_results_['params']):
    print("%0.3f (+/-%0.03f) for %r"
          % (mean, std * 2, params))

print("The scores are computed on the full evaluation set.")

y_true, y_pred = Y, clf_gs.predict(X)
print(classification_report(y_true, y_pred))

我的网格分数为

Tuning hyperparameters for accuracy

{'C': 1000, 'gamma': 0.001, 'kernel': 'rbf'}

Grid scores on development set:

0.994 (+/-0.000) for {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}
0.986 (+/-0.000) for {'C': 10, 'gamma': 0.0001, 'kernel': 'rbf'}
0.995 (+/-0.001) for {'C': 100, 'gamma': 0.001, 'kernel': 'rbf'}
0.988 (+/-0.000) for {'C': 100, 'gamma': 0.0001, 'kernel': 'rbf'}
0.995 (+/-0.001) for {'C': 1000, 'gamma': 0.001, 'kernel': 'rbf'}
0.994 (+/-0.001) for {'C': 1000, 'gamma': 0.0001, 'kernel': 'rbf'}


The scores are computed on the full evaluation set.
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     67343
           3       0.88      0.78      0.83       995

    accuracy                           1.00     68338
   macro avg       0.94      0.89      0.91     68338
weighted avg       1.00      1.00      1.00     68338

sklearn.metrics.GridSearchCV.score说它返回给定数据的分数。是预测模型的准确性得分还是与GridSearchCV相关的其他得分?我只是感到困惑,因为当我仅将参数值设置为使用SVC设置为默认值时,我的准确率仍低于90%时,并没想到会有太大的改进。

1 个答案:

答案 0 :(得分:0)

是的,根据这一行代码:

clf_gs = GridSearchCV(SVC(), tuned_parameters, cv=5,
               scoring = 'accuracy')

,您的得分指标为accuracy

CV / eval分数之间的差异来自数据集:CV在5倍交叉验证集上进行训练和测试,这是您的训练数据的子集。相反,eval在整个训练数据上进行训练,并在测试数据上进行测试,而这与您的训练集无关。

您可以在'k-fold交叉验证washinton'中进行搜索,以查看有关基础算法的更多详细信息。