Question

I have some Pytorch code which demonstrates the gradient calculation within Pytorch, but I am thoroughly confused what got calculated and how it is used. This post here演示了它的用法，但是就反向传播算法而言，这对我来说没有任何意义。在下面的示例中查看in1和in2的梯度，我意识到in1和in2的梯度是损失函数的导数，但我的理解是更新也需要考虑实际损失值吗？损失值在哪里使用？我在这里想念东西吗？

in1 = torch.randn(2,2,requires_grad=True) in2 = torch.randn(2,2,requires_grad=True) target = torch.randn(2,2) l1 = torch.nn.L1Loss() l2 = torch.nn.MSELoss() out1 = l1(in1,target) out2 = l2(in2,target) out1.backward() out2.backward() in1.grad in2.grad

Answer 1

反向传播基于用于计算导数的链规则。这意味着梯度是从尾到头逐步计算的，并且总是传回到上一步（“上一步”是向前一步）。

对于标量输出，通过假设d (out1) / d (out1) = 1的坡度来启动过程，从而启动过程。如果您要在（非标量）张量上调用backward，但由于它不是明确的，您需要提供初始梯度。

让我们看一个涉及更多步骤来计算输出的示例：

a = torch.tensor(1., requires_grad=True)
b = a**2
c = 5*b
c.backward()
print(a.grad)  # Prints: 10.

那这里发生了什么？

使用d(c)/d(c) = 1启动该过程。
然后，将先前的梯度计算为d(c)/d(b) = 5，然后乘以下游梯度（在这种情况下为1），即5 * 1 = 5。
同样，将先前的梯度计算为d(b)/d(a) = 2*a = 2，然后再乘以下游梯度（在这种情况下为5），即2 * 5 = 10。
因此，我们得出初始张量10的梯度值a。

实际上，这将计算d(c)/d(a)，仅此而已。这是c相对于a的梯度，因此没有使用“目标损失”的概念（即使损失为零也不意味着必须是梯度；而是由优化程序决定是否朝正确的方向（下坡）运行，并在损失变得足够小时停止运行。

Understanding Gradient in Pytorch

1 个答案: