我正在尝试使用pytorch获取/跟踪变量的梯度,在该变量中有该变量,将其传递给寻找其他变量的最小值的第一个函数,然后输入第一个函数的输出到第二个功能,整个过程会重复多次。
这是我的代码:
import torch
def myFirstFunction(parameter_current_here):
optimalValue=100000000000000
Optimal=100000000000000
for j in range(2,10):
i= torch.ones(1,requires_grad=True)*j
with torch.enable_grad():
optimalValueNow=i*parameter_current_here.sum()
if (optimalValueNow<optimalValue):
optimalValue=optimalValueNow
Optimal=i
return optimalValue,Optimal
def mySecondFunction(Current):
with torch.enable_grad():
y=(20*Current)/2 + (Current**2)/10
return y
counter=0
while counter<5:
parameter_current = torch.randn(2, 2,requires_grad=True)
outputMyFirstFunction=myFirstFunction(parameter_current)
outputmySecondFunction=mySecondFunction(outputMyFirstFunction[1])
outputmySecondFunction.backward()
print("outputMyFirstFunction after backward:",outputMyFirstFunction)
print("outputmySecondFunction after backward:",outputmySecondFunction)
print("parameter_current Gradient after backward:",parameter_current.grad)
counter=counter+1
当显然不应该将parameter_current.grad设为不适用时,它对所有迭代都不适用。我究竟做错了什么?我该如何解决?
非常感谢您的帮助。非常感谢!
Aly
答案 0 :(得分:1)
我对此也有类似的经历。参考:https://pytorch.org/docs/stable/tensors.html
>>> a = torch.tensor([[1,1],[2,2]], dtype=torch.float, requires_grad=True)
>>> a.is_leaf
True
>>> b = a * a
>>> b.is_leaf
False
>>> c = b.mean()
>>> c.backward()
>>> print(c.grad)
None
>>> print(b.grad)
None
>>> print(a.grad)
tensor([[0.5000, 0.5000],
[1.0000, 1.0000]])
>>> b = a * a
>>> c = b.mean()
>>> b.retain_grad()
>>> c.retain_grad()
>>> c.backward()
>>> print(a.grad)
tensor([[1., 1.],
[2., 2.]])
>>> print(b.grad)
tensor([[0.2500, 0.2500],
[0.2500, 0.2500]])
>>> print(c.grad)
tensor(1.)
答案 1 :(得分:0)
我猜测问题是with
语句。退出torch.enable_grad()
语句后,fit <- smooth.splines( x=train_x, y=train_y, df=seq(2, 20, by=0.5) )
不再适用,并且在函数运行后,割炬将清除渐变。
答案 2 :(得分:0)
由于我不太清楚您实际要存储的内容,除了计算parameter_current
的梯度,
我只专注于描述为什么它不起作用以及可以手动计算渐变。
我在代码中添加了一些注释,以使问题更加清楚。
但是总之,问题在于您的parameter_current
不在损失的计算中。您调用的backward()
上的张量outputmySecondFunction
。
因此,当前,您只为i
计算渐变,因为您已经为其设置了requires_grad=True
。
请检查评论,以获取详细信息:
import torch
def myFirstFunction(parameter_current_here):
# I removed some stuff to reduce it to the core features
# removed torch.enable_grad(), since it is enabled by default
# removed Optimal=100000000000000 and Optimal=i, they are not used
optimalValue=100000000000000
for j in range(2,10):
# Are you sure you want to compute gradients this tensor i?
# Because this is actually what requires_grad=True does.
# Just as a side note, this isn't your problem, but affects performance of the model.
i= torch.ones(1,requires_grad=True)*j
optimalValueNow=i*parameter_current_here.sum()
if (optimalValueNow<optimalValue):
optimalValue=optimalValueNow
# Part Problem 1:
# optimalValueNow is multiplied with your parameter_current
# i is just your parameter i, nothing else
# lets jump now the output below in the loop: outputMyFirstFunction
return optimalValueNow,i
def mySecondFunction(Current):
y=(20*Current)/2 + (Current**2)/10
return y
counter=0
while counter<5:
parameter_current = torch.randn(2, 2,requires_grad=True)
# Part Problem 2:
# this is a tuple (optimalValueNow,i) like described above
outputMyFirstFunction=myFirstFunction(parameter_current)
# now you are taking i as an input
# and i is just torch.ones(1,requires_grad=True)*j
# it as no connection to parameter_current
# thus nothing is optimized
outputmySecondFunction=mySecondFunction(outputMyFirstFunction[1])
# calculating gradients, since parameter_current is not part of the computation
# no gradients will be computed, you only get gradients for i
# Btw. if you would not have set requires_grad=True for i, you actually would get an error message
# for calling backward on this
outputmySecondFunction.backward()
print("outputMyFirstFunction after backward:",outputMyFirstFunction)
print("outputmySecondFunction after backward:",outputmySecondFunction)
print("parameter_current Gradient after backward:",parameter_current.grad)
counter=counter+1
因此,如果您要计算parameter_current
的梯度,只需确保它是计算的一部分
调用backward()
的张量中的一个,您可以例如通过以下方式来实现:
outputmySecondFunction=mySecondFunction(outputMyFirstFunction[1])
收件人:
outputmySecondFunction=mySecondFunction(outputMyFirstFunction[0])
将具有此效果,更改后,您将获得parameter_current
的渐变!
希望对您有帮助!
完整的工作代码:
import torch
def myFirstFunction(parameter_current_here):
optimalValue=100000000000000
for j in range(2,10):
i= torch.ones(1,requires_grad=True)*j
optimalValueNow=i*parameter_current_here.sum()
if (optimalValueNow<optimalValue):
optimalValue=optimalValueNow
return optimalValueNow,i
def mySecondFunction(Current):
y=(20*Current)/2 + (Current**2)/10
return y
counter=0
while counter<5:
parameter_current = torch.randn(2, 2,requires_grad=True)
outputMyFirstFunction=myFirstFunction(parameter_current)
outputmySecondFunction=mySecondFunction(outputMyFirstFunction[0]) # changed line
outputmySecondFunction.backward()
print("outputMyFirstFunction after backward:",outputMyFirstFunction)
print("outputmySecondFunction after backward:",outputmySecondFunction)
print("parameter_current Gradient after backward:",parameter_current.grad)
counter=counter+1
输出:
outputMyFirstFunction after backward: (tensor([ 1.0394]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 10.5021])
parameter_current Gradient after backward: tensor([[ 91.8709, 91.8709],
[ 91.8709, 91.8709]])
outputMyFirstFunction after backward: (tensor([ 13.1481]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 148.7688])
parameter_current Gradient after backward: tensor([[ 113.6667, 113.6667],
[ 113.6667, 113.6667]])
outputMyFirstFunction after backward: (tensor([ 5.7205]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 60.4772])
parameter_current Gradient after backward: tensor([[ 100.2969, 100.2969],
[ 100.2969, 100.2969]])
outputMyFirstFunction after backward: (tensor([-13.9846]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([-120.2888])
parameter_current Gradient after backward: tensor([[ 64.8278, 64.8278],
[ 64.8278, 64.8278]])
outputMyFirstFunction after backward: (tensor([-10.5533]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([-94.3959])
parameter_current Gradient after backward: tensor([[ 71.0040, 71.0040],
[ 71.0040, 71.0040]])