Question

我使用来自Element Research的rnn并使用来自Oxford ML Group的基于nnGraph的LSTM代码在Torch中实现了一个序列贴标机。基于nnGraph的LSTM的训练与牛津ML组中给出的训练类似。

我对两个模块都有相同的超参数。当我在相同数据集上训练两个模块时，我在使用基于nnGraph的LSTM的错误很大（大约5个Fmeasure）时从Element Research得到更低的误差（大约75 Fmeasure）。

为了简化，我在两个模型上通过enitire序列进行Backpropagaton Through Time。对于基于nnGraph的LSTM，我将其克隆为序列的最大长度。

以下是使用rnn包进行培训的代码段：

------------------ forward pass -------------------
local embeddings = {}            -- input embeddings
local loss = 0
if inputSource:size(1) ~= target:size(1) then
  print("Size mismatch "..inputSource:size(1).."\t"..target:size(1))
  os.exit()
end
-- Send the input sequence through a Lookup Table to obtain it's embeddings

for t=1,inputSource:size(1) do
  if options.useGPU then
    embeddings[t] = embed:forward(inputSource[t])[1]:cuda()
  else
    embeddings[t] = embed:forward(inputSource[t])[1]
  end
end

  -- Send the embedding sequence to prduce a table of ner tags
local predictions = sequenceLabeler:forward(embeddings)
loss = loss + criterion:forward(predictions, target)
local gradOutputs = criterion:backward(predictions, target)
sequenceLabeler:backward(embeddings, gradOutputs)
loss = loss / inputSource:size(1)

和使用基于nnGraph的LSTM进行培训的片段是

local embeddings = {}            -- input embeddings
local loss = 0
if inputSource:size(1) ~= target:size(1) then
  print("Size mismatch "..inputSource:size(1).."\t"..target:size(1))
  os.exit()
end
-- Send the input sequence through a Lookup Table to obtain it's embeddings

for t=1,inputSource:size(1) do
  embeddings[t] = embed:forward(inputSource[t])[1]
end
local lstm_c = {[0]=initstate_c} -- internal cell states of LSTM
local lstm_h = {[0]=initstate_h} -- output values of LSTM
local predictions = {}           -- softmax outputs

--  For every input word pass through LSTM module and softmax module 
for t = 1, inputSource:size(1) do
  lstm_c[t], lstm_h[t] =   unpack(clones.memory[t]:forward({embeddings[t]:cuda(), lstm_c[t-1]:cuda(), lstm_h[t-1]:cuda()}))
  predictions[t] = clones.softmax[t]:forward(lstm_h[t])

  loss = loss + clones.criterion[t]:forward(predictions[t]:float(), target[t])
end

local dlstm_c = {} 
local dlstm_h = {} 
--  Gradients from higher layers are zero
dlstm_c[inputSource:size(1)]=dfinalstate_c:cuda() --Zero tensors
dlstm_h[inputSource:size(1)]=dfinalstate_h:cuda() --Sero tensors

local dTempSummary = {} -- gradient to be sent to lookup table. But remember the lookup table isn't modified
for t = inputSource:size(1),1,-1 do
  local doutput_t = clones.criterion[t]:backward(predictions[t]:float(), target[t]):clone()

  --Gradient from output layer. If the token is the last in the sequence there's no additional gradient coming down
  --Else need to consider gradient from previous tokens so add
  if t == inputSource:size(1) then
    dlstm_h[t] = clones.softmax[t]:backward(lstm_h[t], doutput_t):clone()
  else
    dlstm_h[t]:add(clones.softmax[t]:backward(lstm_h[t], doutput_t))
  end

  -- backprop through LSTM timestep
  dTempSummary[t], dlstm_c[t-1], dlstm_h[t-1] = unpack(clones.memory[t]:backward(
      {embeddings[t]:cuda(), lstm_c[t-1]:cuda(), lstm_h[t-1]:cuda()},
      {dlstm_c[t]:cuda(), dlstm_h[t]:cuda()}
  ))

end
loss = loss / inputSource:size(1)

我在这里分享了完整的代码段：Complete Code Snippet for both modules

我知道我在基于nnGraph的LSTM实现中遗漏了一些内容，但无法弄清楚我的错误。有人可以帮我找到我错的地方吗？

输出差异很大：基于nnGraph的LSTM与序列发生器LSTM（Torch）

0 个答案: