Tensorflow:零梯度

时间:2018-05-09 11:26:19

标签: python tensorflow neural-network

基本上,我遇到了一个问题,当我尝试更新范围内的某些变量(即discriminator范围内的变量)时,渐变始终为零(我可以通过计算后将它们打印出来)。

这令人困惑,因为我不明白为什么损失没有通过传播。

我的代码存根如下:

    def build_model(self):
    # Process inputs
    self.inputs =  tf.placeholder(tf.float32, shape = self.input_shape, name = "input")
    self.is_training = tf.placeholder(tf.bool, name = "is_training")
    self.targets = tf.placeholder(tf.float32, shape = self.output_shape, name = "targets")
    self.target_p = tf.placeholder(tf.float32, shape = self.patient_shape, name = "targets_patients")
    self.target_s = tf.placeholder(tf.float32, shape = self.sound_shape, name = "targets_sounds")
    # Process outputs
    self.encoded_X = self.encoder(self.inputs)
    self.posteriors = self.predictor(self.encoded_X)
    self.patient_predict, self.sound_predict  = self.discriminator(self.encoded_X, tf.expand_dims(self.posteriors, axis = -1))
    self.patient_predict_id = tf.argmax(tf.nn.softmax(self.patient_predict, axis = -1))
    self.sound_predict_id = tf.argmax(tf.nn.softmax(self.sound_predict, axis = -1))

    # Process losses
    self.segment_loss = tf.losses.mean_squared_error(self.targets, self.posteriors)
    self.patient_loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits = self.patient_predict, labels = self.target_p)
    self.sound_loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits = self.sound_predict, labels = self.target_s)
    self.disc_loss = self.patient_loss + self.sound_loss
    self.combined_loss = self.segment_loss - self.lambda_param*(self.disc_loss)

    self.extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(self.extra_update_ops):
        predictor_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="predictor")
        encoder_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="encoder")
        discrim_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="discriminator")

        self.discrim_train = tf.train.AdamOptimizer(0.001).minimize(tf.reduce_mean(-1*self.combined_loss), var_list=discrim_vars)
        self.predict_train = tf.train.AdamOptimizer(0.001).minimize(tf.reduce_mean(self.combined_loss), var_list=predictor_vars)
        self.encode_train = tf.train.AdamOptimizer(0.001).minimize(tf.reduce_mean(self.combined_loss), var_list=encoder_vars)

正如您所看到的,self.combined_loss必须依赖于self.patient_loss中的self.discriminator

discriminator()的代码在这里:

    def discriminator(self, encoded_X, posterior, reuse = False):
    with tf.variable_scope("discriminator") as scope:
        if reuse: scope.reuse_variables()
        print('\n############## \nDiscriminator\n')
        print('Discriminator encode input-shape: ', self.encode_shape)
        print('Discriminator posterior input-shape: ', self.output_shape, ' (Expanded to correct size)')
        inputs = tf.concat([encoded_X, posterior], axis = -2)
        tf.stop_gradient(posterior)
        print('Stacked input shape: ', inputs.get_shape())
        h = tf.layers.conv2d(inputs, 10, (5, 2), padding = 'SAME', activation = tf.nn.relu)
        h = tf.layers.max_pooling2d(h, (5, 2), (5, 2))
        print('Layer 1: ', h.get_shape())
        h = tf.layers.conv2d(h, 5, (5, 2), padding = 'SAME', activation = tf.nn.relu)
        h = tf.squeeze(tf.layers.max_pooling2d(h, (3, 2), (3, 2)), axis = -2)
        h = tf.layers.flatten(h)
        print('Layer 2: ', h.get_shape())
        h_p = tf.layers.dense(h, self.patient_shape[-1])
        h_s = tf.layers.dense(h, self.sound_shape[-1])
        print('Discriminator patient o/p shape: ', h_p.get_shape(), ' Expected shape: ', self.patient_shape)
        print('Discriminator sound o/p shape: ', h_s.get_shape(), ' Expected shape: ', self.sound_shape)
        return h_p, h_s
调用

tf.stop_gradient是因为我不希望渐变从鉴别器流向产生后验的模型。

最后,我在这里打电话给我的模特:

        feed_dict= {
          self.inputs: X, 
          self.targets: y,
          self.target_p: y_p_oh,
          self.target_s: y_s_oh,
          self.is_training: True
    }
    posteriors, cost_loss, disc_loss, patient_id_pred , sound_id_pred, _ , _  = self.sess.run([
            self.posteriors, 
            self.combined_loss, 
            self.disc_loss,
            self.patient_predict_id,
            self.sound_predict_id,
            self.predict_train, 
            self.encode_train,
            ], feed_dict = feed_dict)

    j = 0
    print('Combined-loss: ', np.mean(cost_loss), 'Discriminator-loss: ', np.mean(disc_loss))
    while np.mean(disc_loss) > entropy_cutoff:
        disc_loss, _  = self.sess.run([self.disc_loss, self.discrim_train], feed_dict = feed_dict)
        j+=1
        print(' Inner loop iteration: ', j, ' Loss: ', np.around(np.mean(disc_loss), 5), ' Cutoff: ', np.around(entropy_cutoff, 5), end = '\r')
    print("")

从我的研究中,代码卡在while循环上,因为列车步长最小化(有效地最大化combined_loss)只是为discriminator范围内的所有变量提供零梯度。为什么会这样?

编辑:我认为我已将错误本地化为:

  self.disc_loss = self.patient_loss + self.sound_loss
  self.combined_loss = self.segment_loss - self.lambda_param * self.disc_loss

如果我在self.disc_loss上应用最小化,它可以正常工作。但是当我在self.combined_loss上应用最小化时,操作会中断并且渐变为零。为什么会这样呢?

编辑:Tensorboard Graph

enter image description here

0 个答案:

没有答案
相关问题