我正在尝试从头开始在python中实现lstm递归神经网络,并且在编码反向传播以获取正确的梯度时遇到了麻烦。我已经运行了渐变检查,它显示渐变是错误的,但是我看不到代码哪里出错了。 任何帮助将不胜感激,
def backward(cache,next,prob,target,params):
wy,by,wf,bf,wu,bu,wo,bo = params
c_temp,hf,hu,ho,c,a,X,c_old = cache
a_next,c_next = next
dy = np.copy(prob)
dy[0,target] -= 1
dwy = np.dot(a.T,dy)
dby = dy
dh = dy @ wy.T + a_next
dho = tanh(c) * dh
dho = sigmoidGradient(ho) * dho
dc = ho * dh * tanhGradient(c)
dc = dc + c_next
dhf = c_old * dc
dhf = sigmoidGradient(hf) * dhf
dhu = c_temp * dc
dhu = sigmoidGradient(hu) * dhu
dc_temp = hu * dc
dc_temp = tanhGradient(c_temp) * dc_temp
dwf = np.dot(X.T,dhf)
dbf = dhf
dXf = np.dot(dhf,wf.T)
dwu = np.dot(X.T,dhu)
dbu = dhu
dXu = np.dot(hu,wu.T)
dwo = np.dot(X.T,dho)
dbo = dho
dXo = np.dot(dho,wo.T)
dwc = np.dot(X.T,dc_temp)
dbc = dc_temp
dXc = np.dot(dc_temp,wc.T)
dX = dXo + dXc + dXu + dXf
a_next = dX[:,:hidden_size]
c_next = hf * dc
next = (a_next,c_next)
grad = (dwy,dby,dwf,dbf,dwu,dbu,dwo,dbo)
return next, grad
在更新参数的同时,我使用此规则在60个字符的序列后使用梯度下降更新:
for param,dparam in zip([wy,by,wf,bf,wu,bu,wo,bo],[wygrad,bygrad,wfgrad,bfgrad,wugrad,bugrad,wograd,bograd]):
param += -alpha * dparam
但是梯度和参数似乎总是变成Nan,我也很难找到原因。 感谢您的帮助。