Question

我想在以下情况下计算梯度： y = w_0x + w_1和z = w_2x +（dy / dx）^ 2

    w = torch.tensor([2.,1.,3.], requires_grad=True)
    x = torch.tensor([0.5], requires_grad=True)
    y = w[0]*x + w[1]
    y.backward()
    l = x.grad
    l.requires_grad=True
    w.grad.zero_()
    z = w[2]*x + l**2
    z.backward()

我期望[4，0，0.5]相反，我得到[0，0，0.5]。我知道在这种情况下，我可以用w_0代替l，但是，l可以是x的复数函数，在这种情况下，重要的是我要通过数字计算梯度而不是更改z的表达式。请让我知道要更改正确的渐变w.r.t w

Answer 1

您应该一路打印渐变，这样会更容易。我将注释掉代码中发生的事情：

import torch

w = torch.tensor([2.0, 1.0, 3.0], requires_grad=True)
x = torch.tensor([0.5], requires_grad=True)
y = w[0] * x + w[1]
y.backward()
l = x.grad
l.requires_grad = True
print(w.grad) # [0.5000, 1.0000, 0.0000] as expected
w.grad.zero_()
print(w.grad) # [0., 0., 0.] as you cleared the gradient
z = w[2] * x + l ** 2
z.backward()
print(w.grad) # [0., 0., 0.5] - see below

最后一个print(w.grad)的工作方式是这样的，因为您使用的是张量的最后一个元素，并且它是方程式z中唯一参与的元素，因此它乘以x即0.5梯度为0.5。您通过发出w.grad_zero_()清除了渐变。我看不到你怎么能得到[4., 0., 0.5]。如果您不清除渐变，则会得到：tensor([0.5000, 1.0000, 0.5000])，前两个来自第一个y方程，第二个和最后一个来自z方程。

梯度计算

1 个答案: