为什么autograd不会为中间变量产生梯度?

时间:2017-08-31 18:45:40

标签: pytorch autograd

尝试围绕如何表示渐变以及autograd如何工作:

import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([2]), requires_grad=True)
y = x * x
z = y * y

z.backward()

print(x.grad)
#Variable containing:
#32
#[torch.FloatTensor of size 1]

print(y.grad)
#None

为什么不为y生成渐变?如果是y.grad = dz/dy,那么它至少应该生成像y.grad = 2*y这样的变量吗?

1 个答案:

答案 0 :(得分:8)

  

默认情况下,仅为叶子变量保留渐变。非叶子变量的梯度不会被保留以便稍后检查。这是   通过设计完成,以节省内存。

-soumith chintala

请参阅:https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94

选项1:

致电y.retain_grad()

x = Variable(torch.Tensor([2]), requires_grad=True)
y = x * x
z = y * y

y.retain_grad()

z.backward()

print(y.grad)
#Variable containing:
# 8
#[torch.FloatTensor of size 1]

来源:https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/16

选项2:

注册hook,这基本上是计算该梯度时调用的函数。然后你可以保存,分配,打印,等等......

from __future__ import print_function
import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([2]), requires_grad=True)
y = x * x
z = y * y

y.register_hook(print) ## this can be anything you need it to be

z.backward()

输出:

Variable containing:  8 [torch.FloatTensor of size 1

来源:https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/2

另见:https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/7