我正在尝试从本文中实现LSTM优化器:https://arxiv.org/pdf/1606.04474v1.pdf
他们正在做出关于梯度w.r.t的导数的假设。 LSTM参数等于零:
查看我的代码我认为当我优化损失函数时,不使用该假设,因为Theano可以计算此梯度并且它会这样做。我怎样才能阻止它这样做呢?
以下是代码:
def step_opt(cell_previous, hid_previous, theta_previous, *args):
func = self.func(theta_previous)
grad = theano.grad(func, theta_previous)
input_n = grad.dimshuffle(0, 'x')
cell, hid = step(input_n, cell_previous, hid_previous, *args) # function that recomputes LSTM hidden state and cell
theta = theta_previous + hid.dot(self.W_hidden_to_output).dimshuffle(0)
return cell, hid, theta, func
cell_out, hid_out, theta_out, loss_out = theano.scan(
fn=step_opt,
outputs_info=[cell_init, hid_init, theta_init, None],
non_sequences=non_seqs,
n_steps=self.n_steps,
strict=True)[0]
loss = loss_out.sum()
答案 0 :(得分:0)
最终我找到了答案。有这个页面: http://deeplearning.net/software/theano/library/gradient.html
我们可以使用disconnected_grad(expr)
在[{1}}上停止反向传播。