二进制分类器评估,不平衡数据

时间:2020-03-21 17:45:11

标签: machine-learning classification evaluation precision-recall

我正在构建一个分类器,它将检测锂离子电池生产中的故障。分类器必须以> = 90%的精度检测故障。

要做到这一点,我正在构建不同的二进制分类器,并且必须选择最佳分类器。我为此感到挣扎。到目前为止,我开发了两个伪代码。您有什么建议可以更好地满足我的目的吗?

目标是在阳性类别检测中达到90%的准确度,并最大限度地提高召回率(召回率应保持60%以上)。 数据高度不平衡(阳性案例占整个数据集的10%)

选项1)-使用回忆作为得分

1. Develop N classifiers
2. For every classifier:
    -> perform cross-validation on the training set, for every cross-validation check: 
        - find the optimal threshold (threshold at which precision is >=0.9), store the threshold and it's corresponding recall score
    -> calculate mean recall score (sum of all recall scores divided by number of cross-validation checks)
    -> calculate mean optimal threshold (sum of all thresholds divided by number of cross-validation checks)
3. Choose the best classifier based on highest mean recall score
4. Train the best classifier using the entire training data and the mean optimal threshold for this classifier
5. Asses the performance of this classifier on the test set by plotting PR curve and calculating precision and recall scores for the mean optimal threshold used 

选项2)-使用F-beta作为得分

1. Develop N classifiers
2. For every classifier:
    -> perform cross-validation on the training set, for every cross-validation check: 
        - find the optimal threshold (threshold at which f-beta score is highest, with beta=0.2 meaning precision is 5 times more important than recall), store the threshold and the score
    -> on the F-beta-curves plot, place a mean F-beta curve
    -> calculate mean F-beta score (sum of all F-beta scores divided by number of cross-validation checks)
    -> calculate mean optimal threshold (sum of all thresholds divided by number of cross-validation checks)
3. Choose the best classifier based on mean F-beta score
4. Train the best classifier using the entire training data and the mean optimal threshold for this classifier
5. Asses the performance of this classifier on the test set by plotting PR curve and calculating F-beta score for the threshold used 

我认为第一种选择更适合我的目的,但是我从未见过这样的评估方法。在我看来,它可以确保最佳分类器是召回率最高的分类器,同时仍保持0.9左右的精度,对吗?

第二个选项看起来更像是我在论文/博客中读到的常见评估技术,但是我不知道beta = 0.2是否会反映0.9精度(以及如何确定正确的beta)。

每一个建议将不胜感激! :)

0 个答案:

没有答案