向后“ ho”(dho = dsigmoid(ho)* dho)

时间:2019-02-07 15:12:14

标签: lstm recurrent-neural-network

以下是paras的首字母:

hf = sigmoid(X @ Wf + bf)
hi = sigmoid(X @ Wi + bi)
ho = sigmoid(X @ Wo + bo)
hc = tanh(X @ Wc + bc)

c = hf * c_old + hi * hc
h = ho * tanh(c)

y = h @ Wy + by
prob = softmax(y)

当我在LSTM中倒退时,:

# Softmax loss gradient
dy = prob.copy()
dy[1, y_train] -= 1.

# Hidden to output gradient
dWy = h.T @ dy
dby = dy
# Note we're adding dh_next here
dh = dy @ Wy.T + dh_next

# Gradient for ho in h = ho * tanh(c)
dho = tanh(c) * dh
dho = dsigmoid(ho) * dho  (**** problem here)

我了解“ dho = tanh(c)* dh”, 但不知道为什么会这样:“ dho = dsigmoid(ho)* dho”

一些超级英雄能告诉我为什么会得出这样的结论吗?

0 个答案:

没有答案