如果我想将代码片段从Python转换为Torch,这两个是等效的吗?
class RNN:
# ...
def step(self, x):
# update the hidden state
self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x))
# compute the output vector
y = np.dot(self.W_hy, self.h)
return y
Torch的代码:
RNN = {}
function RNN.step(x, prev_h)
-- give them weights and bias
local h2h = nn.Linear( rnnsize, rnnsize )(prev_h)
local i2h = nn.Linear( insize, rnnsize )(x)
-- calculate next hidden state
local next_h = nn.Tanh()( nn.CAddTable(){ h2h, i2h } )
return next_h
end
,其中nn.Linear
给出隐藏状态并输入加权线性变换。但是重量矩阵在哪里保留以供以后优化?可以在GitHub上的LSTM's code中找到类似的Linear()函数:
local function lstm(x, prev_c, prev_h)
-- Calculate all four gates in one go
local i2h = nn.Linear(params.rnn_size, 4*params.rnn_size)(x)
local h2h = nn.Linear(params.rnn_size, 4*params.rnn_size)(prev_h)
local gates = nn.CAddTable()({i2h, h2h})
-- gates calculations
-- ...
local next_c = nn.CAddTable()({
nn.CMulTable()({forget_gate, prev_c}),
nn.CMulTable()({in_gate, in_transform})
})
local next_h = nn.CMulTable()({out_gate, nn.Tanh()(next_c)})
return next_c, next_h
end
似乎每当输入(x
,prev_c
,prev_h
)推进LSTM单元时,内部的权重再次生成。那么如何优化这个模型并在训练期间减重?关于nn.optim
的另一个问题是f
passed to the optimization functions,其中输出f(x)
和df / dx
。这个f
是什么意思?如果它本身代表了错误估计(似乎已完成所有必要的计算),那么optim.sgd
或optim.adadelta
函数的用途是什么?