Question

有没有办法计算标签列表的f1_score作为字符串，无论它们的顺序如何？

f1_score(['a','b','c'],['a','c','b'],average='macro')

我希望这返回1而不是0.33333333333

我知道我可以对标签进行矢量化，但在我的情况下，这种语法会更容易，因为我处理的是很多标签

Answer 1

您需要的是多标签分类任务的f1_score，为此您需要y_true和y_pred形状[n_samples, n_labels]的二维矩阵。

您目前仅提供1-D阵列。因此，它将被视为一个多类问题，而不是多标签。

official documentation提供了必要的详细信息。

为了正确评分，您需要将y_true，y_pred转换为标签指标矩阵为documented here：

y_true ：1d数组，或标签指针数组/稀疏矩阵

y_pred ：1d数组，或标签指针数组/稀疏矩阵

所以你需要改变这样的代码：

from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.metrics import f1_score

y_true = [['a','b','c']]
y_pred = [['a','c','b']]

binarizer = MultiLabelBinarizer()

# This should be your original approach
#binarizer.fit(your actual true output consisting of all labels)

# In this case, I am considering only the given labels.
binarizer.fit(y_true)

f1_score(binarizer.transform(y_true), 
         binarizer.transform(y_pred), 
         average='macro')

Output:  1.0

您可以在此处查看MultilabelBinarizer的示例：

Scikit-学习f1_score以获取字符串列表

1 个答案: