使用cuDNN RNN

时间:2017-04-26 03:39:12

标签: lstm recurrent-neural-network cudnn

我将首先总结一下我认为我对cuDNN 5.1 rnn函数的理解:

张量尺寸

x = [seq_length, batch_size, vocab_size] # input
y = [seq_length, batch_size, hiddenSize] # output

dx = [seq_length, batch_size, vocab_size] # input gradient
dy = [seq_length, batch_size, hiddenSize] # output gradient

hx = [num_layer, batch_size, hiddenSize] # input hidden state
hy = [num_layer, batch_size, hiddenSize] # output hidden state
cx = [num_layer, batch_size, hiddenSize] # input cell state
cy = [num_layer, batch_size, hiddenSize] # output cell state

dhx = [num_layer, batch_size, hiddenSize] # input hidden state gradient
dhy = [num_layer, batch_size, hiddenSize] # output hidden state gradient
dcx = [num_layer, batch_size, hiddenSize] # input cell state gradient
dcy = [num_layer, batch_size, hiddenSize] # output cell state gradient

w = [param size] # parameters (weights & bias)
dw = [param size] # parameters gradients

cudnnRNNForwardTraining / cudnnRNNForwardInference

input: x, hx, cx, w
output: y, hy, cy

cudnnRNNBackwardData

input: y, dy, dhy, dcy, w, hx, cx
output: dx, dhx, dcx

cudnnRNNBackwardWeights

input: x, hx, y, dw
output: dw

问题:

  1. 多层RNN( num_layer > 1)的以下培训工作流程是否正确?
  2.   
        
    1. init hx,cx,dhy,dcy为NULL
    2.   
    3. init w :(权重:小的随机值,偏差:1)
    4.   
    5. 向前
    6.   
    7. 向后数据
    8.   
    9. 落后权重
    10.   
    11. 更新权重:w + = dw
    12.   
    13. dw = 0
    14.   
    15. 转到3。
    16.   
    1. num_layer >时,您确认cuDNN已经实现了堆叠的rnn吗? 1? (无需调用 num_layer 次前进/后退方法)
    2. 我应该重新注入隐藏状态&细胞状态下一批进入网络?
    3. lstm / gru公式中的输出是 hy 。我应该使用 hy 作为输出还是 y
    4. 发布同一问题here(我会同步答案)

0 个答案:

没有答案