Question

我试图在不重新定义 a = f(x,y) 的情况下做这样的事情：

a = f(x,y)
find gradient of a with respect to x
change x
find gradient of a with respect to x
find gradient of a with respect to y

我尝试了下面的部分示例，但它只是给了我一个错误。有谁知道如何在不每次都重新定义原始函数的情况下做到这一点？

>>> x = torch.tensor([2.], requires_grad=True)
>>> y = 10*x**2
>>> torch.autograd.grad(y,x, retain_graph=True)
(tensor([40.]),)
>>> x = torch.tensor([1.], requires_grad=True)
>>> torch.autograd.grad(y,x, retain_graph=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/Philip/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 157, in grad
    inputs, allow_unused)
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Answer 1

<块引用>

如何在改变输入后重新计算梯度？</p>

通常，您需要使用新输入重新计算输出。

考虑反向传播算法的工作方式。根据 f 的形式，需要保存不同的中间结果以供反向传播算法以后使用。这些中间结果可能取决于也可能不取决于 x 的原始值，即使在计算梯度 w.r.t. y。

例如，如果 f(x,y) = g(h(x,y)) 则通过链式法则 df/dy = dg/dh * dh/dy。为了更具体一点，让我们考虑 g 是一些非线性函数和 h(x,y) = x*y 的情况。然后我们有df/dy = g'(h(x,y))*x。反向传播在这里有效的原因是它在前向传播期间缓存 h(x,y) 的中间值，因此它需要做的就是在反向传播期间将该值插入 g'。如果您更改 x 的值，则 h(x,y) 的缓存值将不再是计算您感兴趣的梯度所需的正确值（应该使用 {{1} } 使用 h 的新值计算）。因此，您必须再次重新计算前向传递以存储正确的缓存值。

更改输入后如何重新计算梯度？火炬

1 个答案: