我正在尝试通过3种可能的输出来解决分类问题:0, 1, or 2
。
我的输出层最终为每个标签输出一个概率向量,例如[0.3,0.4,0.3]
我的损失函数定义如下:
loss = criterion(output_batch, label_batch) #criterion = nn.NLLLoss()
现在我的问题与输出和标签与存储数据的方式不匹配有关。输出为size = 3概率矢量的形式(使用soft max加1),我的目标标签是简单的标量。
计算损失函数时,我可以将标签转换为矢量,但是我不确定是否有必要
0 ==> [1,0,0]
1 ==> [0,1,0]
2 ==> [0,0,1]
有人可以阐明这个问题吗?谢谢!
答案 0 :(得分:1)
假设您的课程是:猫,狗和卡皮巴拉。
您有softmax
个预测。
[0.3,0.4,0.3]
softmax
函数在顶部抽取一个结果。在这种情况下,如果dog低于0.4,则我们的输出将预测出这只狗。
请注意预测的总和是1 = 0.3 + 0.4 + 0.3。
现在,您需要计算log softmax的对数,然后NLL只是负数。
计算损失函数时,我可以将标签转换为矢量,但是不确定是否有必要?
0 ==> [1,0,0]
1 ==> [0,1,0]
2 ==> [0,0,1]
在您的情况下这不是必需的。这意味着当您仅显示一个估计值时,我们有三个不同的估计值(bs = 3)。
这里有一些练习:
batch_size, n_classes = 10, 5
x = torch.randn(batch_size, n_classes)
print("x:",x)
target = torch.randint(n_classes, size=(batch_size,), dtype=torch.long)
print("target:",target)
def log_softmax(x):
return x - x.exp().sum(-1).log().unsqueeze(-1)
def nll_loss(p, target):
return -p[range(target.shape[0]), target].mean()
pred = log_softmax(x)
print ("pred:", pred)
ohe = torch.zeros(batch_size, n_classes)
ohe[range(ohe.shape[0]), target]=1
print("ohe:",ohe)
pe = pred[range(target.shape[0]), target]
print("pe:",pe)
mean = pred[range(target.shape[0]), target].mean()
print("mean:",mean)
negmean = -mean
print("negmean:", negmean)
loss = nll_loss(pred, target)
print("loss:",loss)
出局:
x: tensor([[ 1.5837, -1.3132, 1.5513, 1.4422, 0.8072],
[ 1.1740, 1.9250, 0.4258, -1.0320, -0.4650],
[-1.2447, -0.5360, -1.4950, 1.2020, 1.2724],
[ 0.2300, 0.2587, -0.4463, -0.1397, -0.3617],
[-0.7983, 0.7742, 0.0035, 0.9963, -0.7926],
[ 0.7575, -0.8008, 0.7995, 0.0448, 0.6621],
[-1.7153, 0.7672, -0.6841, -0.4826, -0.8614],
[ 0.0263, 0.7244, 0.8751, -1.0226, -1.3762],
[ 0.0192, -0.4368, -0.4010, -1.0660, 0.0364],
[-0.5120, -1.4871, 0.6758, 1.2975, 0.2879]])
target: tensor([0, 4, 3, 0, 0, 4, 1, 2, 4, 2])
pred: tensor([[-1.2094, -4.1063, -1.2418, -1.3509, -1.9859],
[-1.3601, -0.6091, -2.1083, -3.5661, -2.9991],
[-3.3233, -2.6146, -3.5736, -0.8766, -0.8063],
[-1.3302, -1.3015, -2.0065, -1.7000, -1.9220],
[-2.7128, -1.1403, -1.9109, -0.9181, -2.7070],
[-1.2955, -2.8538, -1.2535, -2.0081, -1.3909],
[-3.0705, -0.5881, -2.0394, -1.8379, -2.2167],
[-1.7823, -1.0841, -0.9334, -2.8311, -3.1847],
[-1.2936, -1.7496, -1.7138, -2.3788, -1.2764],
[-2.5641, -3.5393, -1.3764, -0.7546, -1.7643]])
ohe: tensor([[1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1.],
[0., 0., 0., 1., 0.],
[1., 0., 0., 0., 0.],
[1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 0., 1.],
[0., 0., 1., 0., 0.]])
pe: tensor([-1.2094, -2.9991, -0.8766, -1.3302, -2.7128, -1.3909, -0.5881, -0.9334,
-1.2764, -1.3764])
mean: tensor(-1.4693)
negmean: tensor(1.4693)
loss: tensor(1.4693)