我正在调查RNN,在阅读本文后,了解了Backrpop-through-time如何在RNN上运行: https://arxiv.org/pdf/1610.02583.pdf
但我对以下实现(来自cs231)感到困惑:
for t in reversed(xrange(T)):
dh_current = dh[t] + dh_prev
dx_t, dh_prev, dWx_t, dWh_t, db_t = rnn_step_backward(dh_current, cache[t])
dx[t] += dx_t
dh0 = dh_prev
dWx += dWx_t
dWh += dWh_t
db += db_t
为什么要总结dh [t]和dh_prev渐变, dh_current = dh [t] + dh_prev ?
完整源代码:https://github.com/williamchan/cs231-assignment3/blob/master/cs231n/rnn_layers.py