Question

我正在尝试从本文中实现LSTM优化器：https://arxiv.org/pdf/1606.04474v1.pdf

他们正在做出关于梯度w.r.t的导数的假设。 LSTM参数等于零：

查看我的代码我认为当我优化损失函数时，不使用该假设，因为Theano可以计算此梯度并且它会这样做。我怎样才能阻止它这样做呢？

以下是代码：

def step_opt(cell_previous, hid_previous, theta_previous, *args):
    func = self.func(theta_previous)

    grad = theano.grad(func, theta_previous)
    input_n = grad.dimshuffle(0, 'x')

    cell, hid = step(input_n, cell_previous, hid_previous, *args) # function that recomputes LSTM hidden state and cell 

    theta = theta_previous + hid.dot(self.W_hidden_to_output).dimshuffle(0)
    return cell, hid, theta, func

cell_out, hid_out, theta_out, loss_out = theano.scan(
         fn=step_opt,
         outputs_info=[cell_init, hid_init, theta_init, None],
         non_sequences=non_seqs,
         n_steps=self.n_steps,
         strict=True)[0]

loss = loss_out.sum()

Answer 1

最终我找到了答案。有这个页面： http://deeplearning.net/software/theano/library/gradient.html

我们可以使用disconnected_grad(expr)在[{1}}上停止反向传播。

在Theano中实现零导数

1 个答案: