Question

如果我想将代码片段从Python转换为Torch，这两个是等效的吗？

class RNN:
  # ...
  def step(self, x):
    # update the hidden state
    self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x))
    # compute the output vector
    y = np.dot(self.W_hy, self.h)
    return y

Torch的代码：

RNN = {}
function RNN.step(x, prev_h)
  -- give them weights and bias
  local h2h = nn.Linear( rnnsize, rnnsize )(prev_h)
  local i2h = nn.Linear( insize, rnnsize )(x)
  -- calculate next hidden state
  local next_h = nn.Tanh()( nn.CAddTable(){ h2h, i2h } )
  return next_h
end

，其中nn.Linear给出隐藏状态并输入加权线性变换。但是重量矩阵在哪里保留以供以后优化？可以在GitHub上的LSTM's code中找到类似的Linear（）函数：

local function lstm(x, prev_c, prev_h)
  -- Calculate all four gates in one go
  local i2h = nn.Linear(params.rnn_size, 4*params.rnn_size)(x)
  local h2h = nn.Linear(params.rnn_size, 4*params.rnn_size)(prev_h)
  local gates = nn.CAddTable()({i2h, h2h})

  -- gates calculations
  -- ...

  local next_c           = nn.CAddTable()({
  nn.CMulTable()({forget_gate, prev_c}),
  nn.CMulTable()({in_gate,     in_transform})
  })
  local next_h           = nn.CMulTable()({out_gate, nn.Tanh()(next_c)})

  return next_c, next_h
end

似乎每当输入（x，prev_c，prev_h）推进LSTM单元时，内部的权重再次生成。那么如何优化这个模型并在训练期间减重？关于nn.optim的另一个问题是f passed to the optimization functions，其中输出f(x)和df / dx。这个f是什么意思？如果它本身代表了错误估计（似乎已完成所有必要的计算），那么optim.sgd或optim.adadelta函数的用途是什么？

火炬：对重量和优化的困惑

0 个答案: