Question

我正在进行一些分类，我正在查看f1分数并注意到一些奇怪的事情。

当我这样做时：

"f1:" + str(f1_score(y_test_bin, target, average="weighted"))

我明白了：

f1:0.444444444444

当我这样做时：

print "f1:" + str(f1_score(y_test_bin, target,pos_label=0, average="weighted"))

我明白了：

f1:0.823529411765

因为我将平均值设置为“加权”，所以这是一个标准。这应该给我这两个分数的加权平均值。这与＆＃34; True标签＆＃34;

无关

我也可以在分类报告中看到这一点：

         precision    recall  f1-score   support

      0       0.76      0.90      0.82        39
      1       0.60      0.35      0.44        17

avg / total       0.71      0.73      0.71        56

在分类报告中，我得到加权平均值，但不是在我使用f1得分函数时。这是为什么？

Answer 1

f1_score的文档字符串包含一个关于此行为的段落，尽管有点间接

average : string, [None, 'micro', 'macro', 'samples', 'weighted' (default)]
    If ``None``, the scores for each class are returned. Otherwise,
    unless ``pos_label`` is given in binary classification, this
    determines the type of averaging performed on the data:

[...]

     ``'weighted'``:
        Calculate metrics for each label, and find their average, weighted
        by support (the number of true instances for each label). This
        alters 'macro' to account for label imbalance; it can result in an
        F-score that is not between precision and recall.

它说 [...]否则，除非在二进制分类中给出pos_label，[...] ，所以在二进制分类中进行平均被覆盖，函数只返回f1_score，考虑pos_label（默认为1）作为检测。

如评论中所述，二元分类的这种特殊处理已在github issue中讨论过。它以这种方式工作的原因主要是由于遗留而不是其他任何原因：改变这种行为可能会对许多代码库造成破坏。

Answer 2

我也在努力解决这个问题，在这个帖子上阅读eickenberg's answer之后找到了一个解决方案，这绝对值得读一读。

简而言之，当sklearn将数据解释为二进制时，sklearn会自动覆盖平均值以获取正类评分。它会自动执行此操作或指定pos_label时执行此操作。然后，解决方案是将pos_label重新定义为None。

print "f1:" + str(f1_score(y_test_bin, target, pos_label=None, average="weighted"))

希望这有帮助！

奇怪的F1得分结果使用scikit学习

2 个答案: