Question

我有一个不平衡的二进制数据集，大多数是1个标签（6到1）。

我使用class_weight =＆＃39; balance＆＃39;运行GridSearchCV和LinearSVC模型。优化＆＃39; C＆＃39;参数。由于占多数，我认为我需要一个评分函数，比如＆＃39; metrics.average_precision_score＆＃39;有一点不同：它会根据0标签而不是1来计算得分。

我这样做是对的吗？
我有办法做到这一点吗？

Answer 1

我最终在Scikit评分函数文档中找到了答案。

可以根据负面标签计算得分，通过将其重新定义为＆＃34;正面标签＆＃34; （仅限得分）。例如：

from sklearn.model_selection import GridSearchCV
from sklearn.metrics import precision_score, make scorer
# here the scoring function is created. make_scorer passes the pos_label=0
# argument to sklearn.metrics.precision_score() to create the desired function. 
neg_precision = make_scorer(precision_score, pos_label=0)
# some random C parameters for completion
params = {'C': [0.01, 0.03, 0.1, 0.3, 1, 3, 10]}
clf = GridSearchCV(LinearSVC(class_weight='balanced'), cv=10,param_grid=params, scoring=neg_precision)
clf.fit(X, y)

我个人决定使用得分=＆＃39; f1_macro＆＃39;。这计算了阳性标签的f1-得分和阴性标签的f1-得分的非加权平均值。这产生了我追求的结果。

使用scikit-learn以负面示例多数设置不平衡数据集

1 个答案: