Question

使用批量归一化tf.layers.batch_normalization或tf.keras.layers.BatchNormalization时，渐变消失

当我用张量图直方图打印出所有梯度时，将不会训练所有权重和偏差，这些变量上的梯度仅为零。

这是梯度结构，由 self.i_grads = tf.gradients(ys=self.loss_imitation, xs=self.e_params, grad_ys=None)

> self.i_grads: [<tf.Tensor 'imitation_train/gradients/actor_imitation/layer_1/MatMul_grad/MatMul_1:0' shape=(4, 185) dtype=float32>, 
<tf.Tensor 'imitation_train/gradients/actor_imitation/layer_1/add_grad/Reshape_1:0' shape=(1, 185) dtype=float32>, 
None, 
None, 
<tf.Tensor 'imitation_train/gradients/actor_imitation/layer_1/normalizer_actor_layer1/batchnorm/mul_grad/Mul_1:0' shape=(185,) dtype=float32>, 
<tf.Tensor 'imitation_train/gradients/actor_imitation/layer_1/normalizer_actor_layer1/batchnorm/add_1_grad/Reshape_1:0' shape=(185,) dtype=float32>,
<tf.Tensor 'imitation_train/gradients/actor_imitation/action/MatMul_grad/MatMul_1:0' shape=(185, 2) dtype=float32>]
i_grads length: 7

这是tf.GraphKeys.GLOBAL_VARIABLES：

self.e_params: [<tf.Variable 'actor_imitation/layer_1/weight_actor_layer1:0' shape=(4, 185) dtype=float32_ref>, 
<tf.Variable 'actor_imitation/layer_1/bias_actor_layer1:0' shape=(1, 185) dtype=float32_ref>, 
<tf.Variable 'actor_imitation/layer_1/batch_normalization_v1/gamma:0' shape=(4,) dtype=float32>, 
<tf.Variable 'actor_imitation/layer_1/batch_normalization_v1/beta:0' shape=(4,) dtype=float32>, 
<tf.Variable 'actor_imitation/layer_1/normalizer_actor_layer1/gamma:0' shape=(185,) dtype=float32>, 
<tf.Variable 'actor_imitation/layer_1/normalizer_actor_layer1/beta:0' shape=(185,) dtype=float32>, 
<tf.Variable 'actor_imitation/action/weight_actor_action:0' shape=(185, 2) dtype=float32_ref>]
e_params length: 7

该层的代码如下：

   with tf.variable_scope(layer_scope):

            w_collection = tf.get_variable(weight_scope, [layer_input_dim, hidden_neuro_dim], initializer=initializer_w, trainable=trainable)
            b_collection = tf.get_variable(bias_scope, [1, hidden_neuro_dim], initializer=initializer_b, trainable=trainable)

            layer_output_0 = tf.matmul(layer_input, w_collection) + b_collection

#            layer_input_normalization = tf.keras.layers.BatchNormalization()(layer_input, training=True)
#            batch_normalization=tf.keras.layers.BatchNormalization(name=layer_normalizer_scope)
#            layer_output_normalization=batch_normalization(layer_output_0,training=True)
#            tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, batch_normalization.updates)

            layer_output_normalization=tf.layers.batch_normalization(layer_output_0,name=layer_normalizer_scope, training=True) 
            layer_output = tf.nn.leaky_relu(layer_output_normalization)        

            layer_output_dropout = tf.nn.dropout(layer_output,rate=dropout_rate) 

            return layer_output_dropout

梯度计算是这样完成的：

self.e_params = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='actor_imitation')  
self.loss_imitation=tf.reduce_mean(tf.squared_difference(self.a, self.A_I))

with tf.variable_scope('imitation_train'):

    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

    with tf.control_dependencies(update_ops):

        self.opt_i = tf.train.MomentumOptimizer(self.lr_i,self.momentum)  
        self.i_grads = tf.gradients(ys=self.loss_imitation, xs=self.e_params, grad_ys=None)
        self.train_imitation=self.opt_i.apply_gradients(zip(self.i_grads,self.e_params))

从self.i_grads = tf.gradients(ys=self.loss_imitation, xs=self.e_params, grad_ys=None)计算出的部分梯度中的

全部为零。从张量板中，我看到这部分是放置BN层之前的变量。这意味着BN层可以防止向后传播到他之前的层。对这里发生的事情有任何想法吗？

非常感谢！

批次规范化是否可以防止反向传播？使用张量流批量归一化时的零梯度

0 个答案: