我想通过评分标签“1”来使用GridSearchCV以获得最佳f1分数,但不知何故它会针对另一个指标进行优化,我不明白,这是我的代码;
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import make_scorer, f1_score
f1_scorer = make_scorer(f1_score, pos_label=1)
param_random = {'random_state': range(0,10)}
clf = GridSearchCV(RandomForestClassifier(n_estimators=1, max_features=1), param_random, scoring=f1_scorer)
输出,
Best parameters: {'random_state': 8}
predict time: 0.0 s
accuracy: 0.840909090909
precision recall f1-score support
0.0 0.88 0.95 0.91 38
1.0 0.33 0.17 0.22 6
avg / total 0.80 0.84 0.82 44
第二次尝试,只需更改'random_sate',
f1_scorer = make_scorer(f1_score, pos_label=1)
param_random = {'random_state': range(0,100)}
clf = GridSearchCV(RandomForestClassifier(n_estimators=1, max_features=1), param_random, scoring=f1_scorer)
输出,
Best parameters: {'random_state': 91}
predict time: 0.0 s
accuracy: 0.886363636364
precision recall f1-score support
0.0 0.88 1.00 0.94 38
1.0 1.00 0.17 0.29 6
avg / total 0.90 0.89 0.85 44
第三次尝试,
f1_scorer = make_scorer(f1_score, pos_label=1)
param_random = {'random_state': range(0,1000)}
clf = GridSearchCV(RandomForestClassifier(n_estimators=1, max_features=1), param_random, scoring=f1_scorer)
输出,
Best parameters: {'random_state': 273}
predict time: 0.001 s
accuracy: 0.840909090909
precision recall f1-score support
0.0 0.90 0.92 0.91 38
1.0 0.40 0.33 0.36 6
avg / total 0.83 0.84 0.83 44
所以,起初,我认为它根据标签'0'优化,但事实并非如此。我不明白我做错了什么。虽然它看起来很好,但我知道在这个范围内至少有一个更好的分数。
我怎么知道错误,因为我能够手动找到更好的,
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import make_scorer, f1_score
f1_scorer = make_scorer(f1_score, pos_label=1)
param_random = {'random_state': range(2,3)}
clf = GridSearchCV(RandomForestClassifier(n_estimators=1, max_features=1), param_random, scoring=f1_scorer)
Best parameters: {'random_state': 2}
predict time: 0.0 s
accuracy: 0.886363636364
precision recall f1-score support
0.0 0.90 0.97 0.94 38
1.0 0.67 0.33 0.44 6
avg / total 0.87 0.89 0.87 44