我使用来自Element Research的rnn并使用来自Oxford ML Group的基于nnGraph的LSTM代码在Torch中实现了一个序列贴标机。基于nnGraph的LSTM的训练与牛津ML组中给出的训练类似。
我对两个模块都有相同的超参数。当我在相同数据集上训练两个模块时,我在使用基于nnGraph的LSTM的错误很大(大约5个Fmeasure)时从Element Research得到更低的误差(大约75 Fmeasure)。
为了简化,我在两个模型上通过enitire序列进行Backpropagaton Through Time。对于基于nnGraph的LSTM,我将其克隆为序列的最大长度。
以下是使用rnn
包进行培训的代码段:
------------------ forward pass -------------------
local embeddings = {} -- input embeddings
local loss = 0
if inputSource:size(1) ~= target:size(1) then
print("Size mismatch "..inputSource:size(1).."\t"..target:size(1))
os.exit()
end
-- Send the input sequence through a Lookup Table to obtain it's embeddings
for t=1,inputSource:size(1) do
if options.useGPU then
embeddings[t] = embed:forward(inputSource[t])[1]:cuda()
else
embeddings[t] = embed:forward(inputSource[t])[1]
end
end
-- Send the embedding sequence to prduce a table of ner tags
local predictions = sequenceLabeler:forward(embeddings)
loss = loss + criterion:forward(predictions, target)
local gradOutputs = criterion:backward(predictions, target)
sequenceLabeler:backward(embeddings, gradOutputs)
loss = loss / inputSource:size(1)
和使用基于nnGraph的LSTM进行培训的片段是
local embeddings = {} -- input embeddings
local loss = 0
if inputSource:size(1) ~= target:size(1) then
print("Size mismatch "..inputSource:size(1).."\t"..target:size(1))
os.exit()
end
-- Send the input sequence through a Lookup Table to obtain it's embeddings
for t=1,inputSource:size(1) do
embeddings[t] = embed:forward(inputSource[t])[1]
end
local lstm_c = {[0]=initstate_c} -- internal cell states of LSTM
local lstm_h = {[0]=initstate_h} -- output values of LSTM
local predictions = {} -- softmax outputs
-- For every input word pass through LSTM module and softmax module
for t = 1, inputSource:size(1) do
lstm_c[t], lstm_h[t] = unpack(clones.memory[t]:forward({embeddings[t]:cuda(), lstm_c[t-1]:cuda(), lstm_h[t-1]:cuda()}))
predictions[t] = clones.softmax[t]:forward(lstm_h[t])
loss = loss + clones.criterion[t]:forward(predictions[t]:float(), target[t])
end
local dlstm_c = {}
local dlstm_h = {}
-- Gradients from higher layers are zero
dlstm_c[inputSource:size(1)]=dfinalstate_c:cuda() --Zero tensors
dlstm_h[inputSource:size(1)]=dfinalstate_h:cuda() --Sero tensors
local dTempSummary = {} -- gradient to be sent to lookup table. But remember the lookup table isn't modified
for t = inputSource:size(1),1,-1 do
local doutput_t = clones.criterion[t]:backward(predictions[t]:float(), target[t]):clone()
--Gradient from output layer. If the token is the last in the sequence there's no additional gradient coming down
--Else need to consider gradient from previous tokens so add
if t == inputSource:size(1) then
dlstm_h[t] = clones.softmax[t]:backward(lstm_h[t], doutput_t):clone()
else
dlstm_h[t]:add(clones.softmax[t]:backward(lstm_h[t], doutput_t))
end
-- backprop through LSTM timestep
dTempSummary[t], dlstm_c[t-1], dlstm_h[t-1] = unpack(clones.memory[t]:backward(
{embeddings[t]:cuda(), lstm_c[t-1]:cuda(), lstm_h[t-1]:cuda()},
{dlstm_c[t]:cuda(), dlstm_h[t]:cuda()}
))
end
loss = loss / inputSource:size(1)
我在这里分享了完整的代码段:Complete Code Snippet for both modules
我知道我在基于nnGraph的LSTM实现中遗漏了一些内容,但无法弄清楚我的错误。有人可以帮我找到我错的地方吗?