火炬:对重量和优化的困惑

时间:2015-09-19 08:16:02

标签: python machine-learning torch

如果我想将代码片段从Python转换为Torch,这两个是等效的吗?

class RNN:
  # ...
  def step(self, x):
    # update the hidden state
    self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x))
    # compute the output vector
    y = np.dot(self.W_hy, self.h)
    return y

Torch的代码:

RNN = {}
function RNN.step(x, prev_h)
  -- give them weights and bias
  local h2h = nn.Linear( rnnsize, rnnsize )(prev_h)
  local i2h = nn.Linear( insize, rnnsize )(x)
  -- calculate next hidden state
  local next_h = nn.Tanh()( nn.CAddTable(){ h2h, i2h } )
  return next_h
end

,其中nn.Linear给出隐藏状态并输入加权线性变换。但是重量矩阵在哪里保留以供以后优化?可以在GitHub上的LSTM's code中找到类似的Linear()函数:

local function lstm(x, prev_c, prev_h)
  -- Calculate all four gates in one go
  local i2h = nn.Linear(params.rnn_size, 4*params.rnn_size)(x)
  local h2h = nn.Linear(params.rnn_size, 4*params.rnn_size)(prev_h)
  local gates = nn.CAddTable()({i2h, h2h})

  -- gates calculations
  -- ...

  local next_c           = nn.CAddTable()({
  nn.CMulTable()({forget_gate, prev_c}),
  nn.CMulTable()({in_gate,     in_transform})
  })
  local next_h           = nn.CMulTable()({out_gate, nn.Tanh()(next_c)})

  return next_c, next_h
end

似乎每当输入(xprev_cprev_h)推进LSTM单元时,内部的权重再次生成。那么如何优化这个模型并在训练期间减重?关于nn.optim的另一个问题是f passed to the optimization functions,其中输出f(x)df / dx。这个f是什么意思?如果它本身代表了错误估计(似乎已完成所有必要的计算),那么optim.sgdoptim.adadelta函数的用途是什么?

0 个答案:

没有答案