我正在阅读有关反向传播深度神经网络的信息,据我了解,我可以将这种类型的神经网络的算法总结如下:
1-输入x :为输入层设置相应的激活
2-前馈:计算前向传播的误差
3-输出错误:计算输出错误
4-反向传播错误:计算反向传播的错误
5-输出:使用成本函数的梯度
没关系,然后我在带有解释的示例代码下面检查了许多此类深层网络代码:
### imports
import tensorflow as tf
### constant data
x = [[0.,0.],[1.,1.],[1.,0.],[0.,1.]]
y_ = [[0.],[0.],[1.],[1.]]
### induction
# 1x2 input -> 2x3 hidden sigmoid -> 3x1 sigmoid output
# Layer 0 = the x2 inputs
x0 = tf.constant( x , dtype=tf.float32 )
y0 = tf.constant( y_ , dtype=tf.float32 )
# Layer 1 = the 2x3 hidden sigmoid
m1 = tf.Variable( tf.random_uniform( [2,3] , minval=0.1 , maxval=0.9 , dtype=tf.float32 ))
b1 = tf.Variable( tf.random_uniform( [3] , minval=0.1 , maxval=0.9 , dtype=tf.float32 ))
h1 = tf.sigmoid( tf.matmul( x0,m1 ) + b1 )
# Layer 2 = the 3x1 sigmoid output
m2 = tf.Variable( tf.random_uniform( [3,1] , minval=0.1 , maxval=0.9 , dtype=tf.float32 ))
b2 = tf.Variable( tf.random_uniform( [1] , minval=0.1 , maxval=0.9 , dtype=tf.float32 ))
y_out = tf.sigmoid( tf.matmul( h1,m2 ) + b2 )
### loss
# loss : sum of the squares of y0 - y_out
loss = tf.reduce_sum( tf.square( y0 - y_out ) )
# training step : gradient decent (1.0) to minimize loss
train = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
### training
# run 500 times using all the X and Y
# print out the loss and any other interesting info
with tf.Session() as sess:
sess.run( tf.global_variables_initializer() )
for step in range(500) :
sess.run(train)
results = sess.run([m1,b1,m2,b2,y_out,loss])
labels = "m1,b1,m2,b2,y_out,loss".split(",")
for label,result in zip(*(labels,results)) :
print ""
print label
print result
print ""
我的问题是,上面的代码正在计算正向传播的误差,但是我看不到任何计算反向传播误差的步骤。换句话说,按照上面的描述,我可以看到步骤1(输入x ),2(前馈),3(输出错误 )和5(输出),但是代码中没有显示第4步(反向传播错误)!这是正确的还是代码中缺少的东西?我在网上找到的所有代码都在反向传播深度神经网络中遵循相同的步骤! 请您描述一下反向传播错误的步骤是如何发生代码的,或者我应该在执行该步骤的过程中添加些什么?
谢谢
答案 0 :(得分:1)
简单来说,当您构建TF图直至计算代码中的损失时,TF就会知道损失取决于哪个tf.Variable
(权重)。然后,当您创建节点train = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
并随后在tf.Session
中运行它时,反向传播将在后台为您完成。更具体地说,train = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
合并了以下步骤:
# 1. Create a GD optimizer with a learning rate of 1.0
optimizer = tf.train.GradientDescentOptimizer(1.0)
# 2. Compute the gradients for each of the variables (weights) with respect to the loss
gradients, variables = zip(*optimizer.compute_gradients(loss))
# 3. Update the variables (weights) based on the computed gradients
train = optimizer.apply_gradients(zip(gradients, variables))
特别是,步骤1
和2
总结了反向传播步骤。希望这会使您更清楚!
此外,我想重组您的问题中的步骤:
tf.Variable
相乘。tf.Variable
(权重)相对于损耗的梯度。 tf.Variable
(权重)的相应梯度来更新它们。请注意,第4步和第5步封装了反向传播。