梯度在pytorch中不可用

时间:2018-11-27 20:07:47

标签: python python-3.x gradient pytorch gradients

我正在尝试使用pytorch获取/跟踪变量的梯度,在该变量中有该变量,将其传递给寻找其他变量的最小值的第一个函数,然后输入第一个函数的输出到第二个功能,整个过程会重复多次。

这是我的代码:

import torch

def myFirstFunction(parameter_current_here):
    optimalValue=100000000000000
    Optimal=100000000000000
    for j in range(2,10):
        i= torch.ones(1,requires_grad=True)*j
        with torch.enable_grad():
            optimalValueNow=i*parameter_current_here.sum()
        if (optimalValueNow<optimalValue):
            optimalValue=optimalValueNow
            Optimal=i
    return optimalValue,Optimal

def mySecondFunction(Current):
    with torch.enable_grad():
        y=(20*Current)/2 + (Current**2)/10
    return y

counter=0
while counter<5:
    parameter_current = torch.randn(2, 2,requires_grad=True)

    outputMyFirstFunction=myFirstFunction(parameter_current)
    outputmySecondFunction=mySecondFunction(outputMyFirstFunction[1])
    outputmySecondFunction.backward()

    print("outputMyFirstFunction after backward:",outputMyFirstFunction)
    print("outputmySecondFunction after backward:",outputmySecondFunction)
    print("parameter_current Gradient after backward:",parameter_current.grad)

    counter=counter+1

当显然不应该将parameter_current.grad设为不适用时,它对所有迭代都不适用。我究竟做错了什么?我该如何解决?

非常感谢您的帮助。非常感谢!

Aly

3 个答案:

答案 0 :(得分:1)

我对此也有类似的经历。参考:https://pytorch.org/docs/stable/tensors.html

  • 对于具有require_grad为True的张量,如果它们是由用户创建的,它们将是叶张量。这意味着它们不是操作的结果,因此grad_fn为None。
  • 在调用backward()期间,只会填充叶张量的等级。要获取非叶张量的grad,可以使用retain_grad()。 示例:
    >>> a = torch.tensor([[1,1],[2,2]], dtype=torch.float, requires_grad=True)
    >>> a.is_leaf
    True
    >>> b = a * a
    >>> b.is_leaf
    False
    >>> c = b.mean()
    >>> c.backward()
    >>> print(c.grad)
    None
    >>> print(b.grad)
    None
    >>> print(a.grad)
    tensor([[0.5000, 0.5000],
            [1.0000, 1.0000]])
    >>> b = a * a
    >>> c = b.mean()
    >>> b.retain_grad()
    >>> c.retain_grad()
    >>> c.backward()
    >>> print(a.grad)
    tensor([[1., 1.],
            [2., 2.]])
    >>> print(b.grad)
    tensor([[0.2500, 0.2500],
            [0.2500, 0.2500]])
    >>> print(c.grad)
    tensor(1.)

答案 1 :(得分:0)

我猜测问题是with语句。退出torch.enable_grad()语句后,fit <- smooth.splines( x=train_x, y=train_y, df=seq(2, 20, by=0.5) ) 不再适用,并且在函数运行后,割炬将清除渐变。

答案 2 :(得分:0)

由于我不太清楚您实际要存储的内容,除了计算parameter_current的梯度, 我只专注于描述为什么它不起作用以及可以手动计算渐变。

我在代码中添加了一些注释,以使问题更加清楚。

但是总之,问题在于您的parameter_current不在损失的计算中。您调用的backward()上的张量outputmySecondFunction

因此,当前,您只为i计算渐变,因为您已经为其设置了requires_grad=True

请检查评论,以获取详细信息:

import torch

def myFirstFunction(parameter_current_here):
    # I removed some stuff to reduce it to the core features
    # removed torch.enable_grad(), since it is enabled by default
    # removed Optimal=100000000000000 and Optimal=i, they are not used
    optimalValue=100000000000000
    for j in range(2,10):
        # Are you sure you want to compute gradients this tensor i? 
        # Because this is actually what requires_grad=True does.
        # Just as a side note, this isn't your problem, but affects performance of the model.
        i= torch.ones(1,requires_grad=True)*j
        optimalValueNow=i*parameter_current_here.sum()
        if (optimalValueNow<optimalValue):
            optimalValue=optimalValueNow

    # Part Problem 1:
    # optimalValueNow is multiplied with your parameter_current
    # i is just your parameter i, nothing else
    # lets jump now the output below in the loop: outputMyFirstFunction
    return optimalValueNow,i

def mySecondFunction(Current):
    y=(20*Current)/2 + (Current**2)/10
    return y

counter=0
while counter<5:
    parameter_current = torch.randn(2, 2,requires_grad=True)

    # Part Problem 2:
    # this is a tuple (optimalValueNow,i) like described above
    outputMyFirstFunction=myFirstFunction(parameter_current)
    # now you are taking i as an input
    # and i is just torch.ones(1,requires_grad=True)*j
    # it as no connection to parameter_current
    # thus nothing is optimized
    outputmySecondFunction=mySecondFunction(outputMyFirstFunction[1])

    # calculating gradients, since parameter_current is not part of the computation 
    # no gradients will be computed, you only get gradients for i
    # Btw. if you would not have set requires_grad=True for i, you actually would get an error message
    # for calling backward on this
    outputmySecondFunction.backward()

    print("outputMyFirstFunction after backward:",outputMyFirstFunction)
    print("outputmySecondFunction after backward:",outputmySecondFunction)
    print("parameter_current Gradient after backward:",parameter_current.grad)

    counter=counter+1

因此,如果您要计算parameter_current的梯度,只需确保它是计算的一部分 调用backward()的张量中的一个,您可以例如通过以下方式来实现:

outputmySecondFunction=mySecondFunction(outputMyFirstFunction[1])

收件人:

outputmySecondFunction=mySecondFunction(outputMyFirstFunction[0])

将具有此效果,更改后,您将获得parameter_current的渐变!

希望对您有帮助!



完整的工作代码:

import torch

def myFirstFunction(parameter_current_here):
    optimalValue=100000000000000
    for j in range(2,10):
        i= torch.ones(1,requires_grad=True)*j
        optimalValueNow=i*parameter_current_here.sum()
        if (optimalValueNow<optimalValue):
            optimalValue=optimalValueNow

    return optimalValueNow,i

def mySecondFunction(Current):
    y=(20*Current)/2 + (Current**2)/10
    return y

counter=0
while counter<5:
    parameter_current = torch.randn(2, 2,requires_grad=True)
    outputMyFirstFunction=myFirstFunction(parameter_current)
    outputmySecondFunction=mySecondFunction(outputMyFirstFunction[0]) # changed line
    outputmySecondFunction.backward()

    print("outputMyFirstFunction after backward:",outputMyFirstFunction)
    print("outputmySecondFunction after backward:",outputmySecondFunction)
    print("parameter_current Gradient after backward:",parameter_current.grad)

    counter=counter+1

输出:

outputMyFirstFunction after backward: (tensor([ 1.0394]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 10.5021])
parameter_current Gradient after backward: tensor([[ 91.8709,  91.8709],
        [ 91.8709,  91.8709]])
outputMyFirstFunction after backward: (tensor([ 13.1481]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 148.7688])
parameter_current Gradient after backward: tensor([[ 113.6667,  113.6667],
        [ 113.6667,  113.6667]])
outputMyFirstFunction after backward: (tensor([ 5.7205]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 60.4772])
parameter_current Gradient after backward: tensor([[ 100.2969,  100.2969],
        [ 100.2969,  100.2969]])
outputMyFirstFunction after backward: (tensor([-13.9846]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([-120.2888])
parameter_current Gradient after backward: tensor([[ 64.8278,  64.8278],
        [ 64.8278,  64.8278]])
outputMyFirstFunction after backward: (tensor([-10.5533]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([-94.3959])
parameter_current Gradient after backward: tensor([[ 71.0040,  71.0040],
        [ 71.0040,  71.0040]])