Question

我对sklearn.metrics.f1_score中的weighted平均值有疑问

sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average='weighted', sample_weight=None)

Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

首先，如果有任何参考证明加权F1的使用是合理的，那么我只是好奇，在哪些情况下我应该使用加权F1。

其次，我听说加权F1被弃用了，是真的吗？

第三，如何计算实际加权F1，例如

{
    "0": {
        "TP": 2,
        "FP": 1,
        "FN": 0,
        "F1": 0.8
    },
    "1": {
        "TP": 0,
        "FP": 2,
        "FN": 2,
        "F1": -1
    },
    "2": {
        "TP": 1,
        "FP": 1,
        "FN": 2,
        "F1": 0.4
    }
}

如何计算上例的加权F1。我虽然它应该是（0.8 * 2/3 + 0.4 * 1/3）/ 3，但我错了。

Answer 1

首先，如果有任何参考证明加权F1的使用是合理的，那么我只是好奇，在哪些情况下我应该使用加权F1。

我没有任何参考资料，但是如果你对多标签分类感兴趣，你关心所有类的精确度/召回率，那么加权f1分数是合适的。如果你有二元分类，你只关心阳性样本，那么它可能是不合适的。

其次，我听说加权F1被弃用了，是真的吗？

不，加权F1本身并未被弃用。只有函数接口的某些方面被弃用，回到v0.16，然后只是为了使它在以前模糊的情况下更明确。（历史性讨论on github或查看the source code并在页面中搜索＆＃34;已弃用＆＃34;以查找详细信息。）

第三，如何计算实际加权F1？

来自f1_score的文档：

``'weighted'``:
  Calculate metrics for each label, and find their average, weighted
  by support (the number of true instances for each label). This
  alters 'macro' to account for label imbalance; it can result in an
  F-score that is not between precision and recall.

因此平均值由 support 加权， support 是具有给定标签的样本数。由于上面的示例数据不包含支持，因此无法根据您列出的信息计算加权f1分数。

scikit加权f1分数计算和使用

1 个答案: