如果用于渐变更新的索引叶变量,如何到位操作错误?

时间:2018-03-07 21:39:08

标签: python neural-network deep-learning gradient-descent pytorch

当我尝试索引叶子变量以使用自定义缩放功能更新渐变时,我遇到了In place操作错误。我不能解决它。任何帮助都非常感谢!

import torch.nn as nn
import torch
import numpy as np
from torch.autograd import Variable, Function

# hyper parameters
batch_size = 100 # batch size of images
ld = 0.2 # sparse penalty
lr = 0.1 # learning rate

x = Variable(torch.from_numpy(np.random.normal(0,1,(batch_size,10,10))), requires_grad=False)  # original

# depends on size of the dictionary, number of atoms.
D = Variable(torch.from_numpy(np.random.normal(0,1,(500,10,10))), requires_grad=True)

# hx sparse representation
ht = Variable(torch.from_numpy(np.random.normal(0,1,(batch_size,500,1,1))), requires_grad=True)

# Dictionary loss function
loss = nn.MSELoss()

# customized shrink function to update gradient
shrink_ht = lambda x: torch.stack([torch.sign(i)*torch.max(torch.abs(i)-lr*ld,0)[0] for i in x])

### sparse reprsentation optimizer_ht single image.
optimizer_ht = torch.optim.SGD([ht], lr=lr, momentum=0.9) # optimizer for sparse representation

## update for the batch
for idx in range(len(x)):
    optimizer_ht.zero_grad() # clear up gradients
    loss_ht = 0.5*torch.norm((x[idx]-(D*ht[idx]).sum(dim=0)),p=2)**2
    loss_ht.backward() # back propogation and calculate gradients
    optimizer_ht.step() # update parameters with gradients
    ht[idx] = shrink_ht(ht[idx])  # customized shrink function.

RuntimeError Traceback (most recent call last) in ()
15 loss_ht.backward() # back propogation and calculate gradients
16 optimizer_ht.step() # update parameters with gradients
—> 17 ht[idx] = shrink_ht(ht[idx]) # customized shrink function.
18
19

/home/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py in setitem(self, key, value)
85 return MaskedFill.apply(self, key, value, True)
86 else:
—> 87 return SetItem.apply(self, key, value)
88
89 def deepcopy(self, memo):

RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

具体来说,下面这行代码似乎给出了错误,因为它同时索引和更新叶变量。

ht[idx] = shrink_ht(ht[idx])  # customized shrink function.

感谢。

W.S。

3 个答案:

答案 0 :(得分:3)

我只是发现:要更新变量,应该使用ht.data [idx]。使用数据直接访问张量。

答案 1 :(得分:1)

问题来自ht需要毕业的事实:

ht = Variable(torch.from_numpy(np.random.normal(0,1,(batch_size,500,1,1))), requires_grad=True)

对于需要grads的变量,pytorch不允许为它们(切片)赋值。你做不到:

ht[idx] = some_tensor

这意味着您需要使用内置的pytorch函数(如squeezeunsqueeze等)找到另一种方法来执行自定义缩小功能。

另一种选择是将您的shrink_ht(ht[idx])切片分配给另一个不需要毕业生的变量或张量。

答案 2 :(得分:1)

在这里使用ht.data[idx]是可以的,但是新约定是​​显式使用torch.no_grad(),例如:

with torch.no_grad(): 
    ht[idx] = shrink_ht(ht[idx])

请注意,此就地操作没有渐变。换句话说,梯度仅向后退至shrunk的{​​{1}}值,而不向后退至ht的{​​{1}}值。