Question

我将首先总结一下我认为我对cuDNN 5.1 rnn函数的理解：

张量尺寸

x = [seq_length, batch_size, vocab_size] # input
y = [seq_length, batch_size, hiddenSize] # output

dx = [seq_length, batch_size, vocab_size] # input gradient
dy = [seq_length, batch_size, hiddenSize] # output gradient

hx = [num_layer, batch_size, hiddenSize] # input hidden state
hy = [num_layer, batch_size, hiddenSize] # output hidden state
cx = [num_layer, batch_size, hiddenSize] # input cell state
cy = [num_layer, batch_size, hiddenSize] # output cell state

dhx = [num_layer, batch_size, hiddenSize] # input hidden state gradient
dhy = [num_layer, batch_size, hiddenSize] # output hidden state gradient
dcx = [num_layer, batch_size, hiddenSize] # input cell state gradient
dcy = [num_layer, batch_size, hiddenSize] # output cell state gradient

w = [param size] # parameters (weights & bias)
dw = [param size] # parameters gradients

cudnnRNNForwardTraining / cudnnRNNForwardInference

input: x, hx, cx, w
output: y, hy, cy

cudnnRNNBackwardData

input: y, dy, dhy, dcy, w, hx, cx
output: dx, dhx, dcx

cudnnRNNBackwardWeights

input: x, hx, y, dw
output: dw

问题：

多层RNN（ num_layer > 1）的以下培训工作流程是否正确？

init hx，cx，dhy，dcy为NULL

init w :(权重：小的随机值，偏差：1）

向前

向后数据

落后权重

更新权重：w + = dw

dw = 0

转到3。

当 num_layer ＆gt;时，您确认cuDNN已经实现了堆叠的rnn吗？ 1？（无需调用 num_layer 次前进/后退方法）
我应该重新注入隐藏状态＆amp;细胞状态下一批进入网络？
lstm / gru公式中的输出是 hy 。我应该使用 hy 作为输出还是 y ？

发布同一问题here（我会同步答案）

使用cuDNN RNN

0 个答案: