PyTorch解决方案

Question

我正在尝试使用270标签解决一个多标签问题，并且我已经将目标标签转换为一种热编码形式。我正在使用BCEWithLogitsLoss()。由于训练数据不平衡，因此我使用了pos_weight参数，但有点困惑。

pos_weight（张量，可选）–大量正面例子。必须是长度等于类数的向量。

我是否需要将每个标签的正值总数作为张量给出，否则它们的权重含义是什么？

Answer 1

PyTorch documentation for BCEWithLogitsLoss建议pos_weight为每个类别的负数与正数之间的比率。

因此，如果len(dataset)为1000，则multihot编码的元素0具有100个正计数，则pos_weights_vector的元素0应该为900/100 = 9。这意味着二进制交叉损失将表现为数据集包含900个正例而不是100个正例。

这是我的实现方式

  def calculate_pos_weights(class_counts):
    pos_weights = np.ones_like(class_counts)
    neg_counts = [len(data)-pos_count for pos_count in class_counts]
    for cdx, pos_count, neg_count in enumerate(zip(class_counts,  neg_counts)):
      pos_weights[cdx] = neg_count / (pos_count + 1e-5)

    return torch.as_tensor(pos_weights, dtype=torch.float)

class_counts只是正样本的按列求和。我posted it在PyTorch论坛上，其中一位PyTorch开发人员对此表示了祝福。

Answer 2

PyTorch解决方案

好吧，实际上我已经阅读了文档，并且确实可以使用pos_weight。

此参数为每个类别的正样本赋予权重，因此，如果您有270个类别，则应将形状为torch.Tensor的{{1}}传递给每个类别以定义权重。

以下是documentation中的经过修改的代码段：

(270,)

自制解决方案

在加权时，没有内置的解决方案，但是您可以真正轻松地编写自己的代码：

# 270 classes, batch size = 64    
target = torch.ones([64, 270], dtype=torch.float32)  
# Logits outputted from your network, no activation
output = torch.full([64, 270], 0.9)
# Weights, each being equal to one. You can input your own here.
pos_weight = torch.ones([270])
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
criterion(output, target)  # -log(sigmoid(0.9))

import torch class WeightedMultilabel(torch.nn.Module): def __init__(self, weights: torch.Tensor): self.loss = torch.nn.BCEWithLogitsLoss() self.weights = weights.unsqueeze() def forward(outputs, targets): return self.loss(outputs, targets) * self.weights的长度必须与多标签分类中的类数相同（270），每个类都为您的特定示例赋予权重。

计算权重

您只需在数据集中添加每个样本的标签，除以最小值并最后取反。

代码段排序：

Tensor

使用这种发生率最小的方法类别将产生正常的损失，而其他类别的权重将小于weights = torch.zeros_like(dataset[0]) for element in dataset: weights += element weights = 1 / (weights / torch.min(weights))。

虽然它可能在训练过程中引起一些不稳定，所以您可能需要尝试一下这些值（也许是1变换而不是线性变换？）

其他方法

您可能会考虑上采样/下采样（尽管此操作很复杂，因为您也会添加/删除其他类，所以我认为需要高级启发式方法。）

Answer 3

仅提供对@crypdick答案的快速修订，此功能的实现对我有用：

def calculate_pos_weights(class_counts,data):
    pos_weights = np.ones_like(class_counts)
    neg_counts = [len(data)-pos_count for pos_count in class_counts]
    for cdx, (pos_count, neg_count) in enumerate(zip(class_counts,  neg_counts)):
        pos_weights[cdx] = neg_count / (pos_count + 1e-5)

    return torch.as_tensor(pos_weights, dtype=torch.float)

data是您要为其应用权重的数据集。

如何在pytorch中计算BCEWithLogitsLoss的不平衡权重

3 个答案:

PyTorch解决方案

自制解决方案

计算权重

其他方法