Python sklearn:得分

时间:2016-07-24 16:10:42

标签: python scikit-learn scoring grid-search

我想通过评分标签“1”来使用GridSearchCV以获得最佳f1分数,但不知何故它会针对另一个指标进行优化,我不明白,这是我的代码;

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import make_scorer, f1_score
f1_scorer = make_scorer(f1_score, pos_label=1)
param_random = {'random_state': range(0,10)}
clf = GridSearchCV(RandomForestClassifier(n_estimators=1, max_features=1), param_random, scoring=f1_scorer)

输出,

Best parameters:  {'random_state': 8}
predict time: 0.0 s
accuracy: 0.840909090909
             precision    recall  f1-score   support

        0.0       0.88      0.95      0.91        38
        1.0       0.33      0.17      0.22         6

avg / total       0.80      0.84      0.82        44

第二次尝试,只需更改'random_sate',

f1_scorer = make_scorer(f1_score, pos_label=1)
param_random = {'random_state': range(0,100)}
clf = GridSearchCV(RandomForestClassifier(n_estimators=1, max_features=1), param_random, scoring=f1_scorer)

输出,

Best parameters:  {'random_state': 91}
predict time: 0.0 s
accuracy: 0.886363636364
                 precision    recall  f1-score   support

        0.0       0.88      1.00      0.94        38
        1.0       1.00      0.17      0.29         6

avg / total       0.90      0.89      0.85        44

第三次尝试,

f1_scorer = make_scorer(f1_score, pos_label=1)
param_random = {'random_state': range(0,1000)}
clf = GridSearchCV(RandomForestClassifier(n_estimators=1, max_features=1), param_random, scoring=f1_scorer)

输出,

Best parameters:  {'random_state': 273}
predict time: 0.001 s
accuracy: 0.840909090909
             precision    recall  f1-score   support

        0.0       0.90      0.92      0.91        38
        1.0       0.40      0.33      0.36         6

avg / total       0.83      0.84      0.83        44

所以,起初,我认为它根据标签'0'优化,但事实并非如此。我不明白我做错了什么。虽然它看起来很好,但我知道在这个范围内至少有一个更好的分数。

我怎么知道错误,因为我能够手动找到更好的,

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import make_scorer, f1_score
f1_scorer = make_scorer(f1_score, pos_label=1)
param_random = {'random_state': range(2,3)}
clf = GridSearchCV(RandomForestClassifier(n_estimators=1, max_features=1), param_random, scoring=f1_scorer)

Best parameters:  {'random_state': 2}
predict time: 0.0 s
accuracy: 0.886363636364
             precision    recall  f1-score   support

        0.0       0.90      0.97      0.94        38
        1.0       0.67      0.33      0.44         6

avg / total       0.87      0.89      0.87        44

0 个答案:

没有答案