我在python中开发了一个小代码,该代码使用4个神经元(2个输入,3个隐藏层中的神经元和1个输出神经元),该代码确实是特定原因,我想仔细了解每个操作。 它可以工作,但我仍然有偏见的一个问题!
for epoch in range(epochs):
layer1, predictions = predict_output_neural(features, weights_11, weights_12, weights_13, weight_ouput, bias_11, bias_12, bias_13, bias_output)
if epoch % 10 == 0:
layer1, predictions = predict_output_neural(features, weights_11, weights_12, weights_13, weight_ouput, bias_11, bias_12, bias_13, bias_output)
print (cost(predictions, targets))
"""
There are a lot of things to do here !
to do the back propagation, we will first train the ouput neural
"""
#Init gradient
weights_gradient_output = np.zeros(weight_ouput.shape)
bias_gradient_output = 0
weights_gradient_11 = np.zeros(weights_11.shape)
bias_gradient_11 = 0
weights_gradient_12 = np.zeros(weights_12.shape)
bias_gradient_12 = 0
weights_gradient_13 = np.zeros(weights_12.shape)
bias_gradient_13 = 0
#Go throught each row
for neural_input, feature, target, prediction in zip(layer1, features, targets, predictions):
output_error = prediction - target
output_delta = output_error * derivative_activation_y(prediction)
error_neural_hidden_11 = output_delta * weight_ouput[0]
error_neural_hidden_12 = output_delta * weight_ouput[1]
error_neural_hidden_13 = output_delta * weight_ouput[2]
error_neural_11 = error_neural_hidden_11 * derivative_activation_y(neural_input[0])
error_neural_12 = error_neural_hidden_12 * derivative_activation_y(neural_input[1])
error_neural_13 = error_neural_hidden_13 * derivative_activation_y(neural_input[2])
weights_gradient_output += neural_input * output_delta
#bias_output += output_delta
weights_gradient_11 += feature * error_neural_11
#bias_11 += error_neural_11
weights_gradient_12 += feature * error_neural_12
#bias_12 += error_neural_12
weights_gradient_13 += feature * error_neural_13
#bias_13 += error_neural_13
#Update the weights and bias
weight_ouput = weight_ouput - (learning_rate * weights_gradient_output)
bias_output = bias_output - (learning_rate * bias_gradient_output)
weights_11 = weights_11 - (learning_rate * weights_gradient_11)
bias_11 = bias_11 - (learning_rate * bias_gradient_11)
weights_12 = weights_12 - (learning_rate * weights_gradient_12)
bias_12 = bias_12 - (learning_rate * bias_gradient_12)
weights_13 = weights_13 - (learning_rate * weights_gradient_13)
bias_13 = bias_13 - (learning_rate * bias_gradient_13)
这给了我很好的结果,但是一旦我取消注释以修改每个神经元的偏见的行,那就超级错误了!它收敛到0.5(例如0,4999999)
你知道为什么吗?看起来偏置梯度的更新很好,不是吗?
答案 0 :(得分:1)
如果您在此处查看梯度累积代码,
weights_gradient_output += neural_input * output_delta
#bias_output += output_delta
您是将渐变直接添加到偏差而不是添加到bias_gradient_output
。因此,偏差更新使用的学习率为1,这可能比您预期的要高。 (与bias_11
类似的问题,等等)。