Question

当我使用lstm的dropout机制时，胭脂评分和无丢失模型的丢失比具有辍学的模型表现更好。所以我想知道我的辍学代码是否正确？我使用tensorflow 0.12

  cellClass = tf.nn.rnn_cell.LSTMCell
  for layer_i in xrange(hps.enc_layers):
    with tf.variable_scope('encoder%d'%layer_i), tf.device(
        self._next_device()):
      #bidirectional rnn cell
      cell_fw = cellClass(
          hps.num_hidden
          ,initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=123),
          state_is_tuple=False
      )
      cell_bw = cellClass(
          hps.num_hidden
          ,initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=113),
          state_is_tuple=False
      )
      cell_fw = tf.nn.rnn_cell.DropoutWrapper(cell_fw, input_keep_prob=hps.input_dropout, output_keep_prob=hps.output_dropout)
      cell_bw = tf.nn.rnn_cell.DropoutWrapper(cell_bw, input_keep_prob=hps.input_dropout, output_keep_prob=hps.output_dropout)
      (emb_encoder_inputs, fw_state, _) = tf.nn.bidirectional_rnn(
          cell_fw, cell_bw, emb_encoder_inputs, dtype=tf.float32,
          sequence_length=article_lens)
    #decoder
    cell = cellClass(
        hps.num_hidden
        ,initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=113),
        state_is_tuple=False
        )
    cell=tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=hps.input_dropout, output_keep_prob=hps.output_dropout)
    decoder_outputs, self._dec_out_state, self.cur_attns, self.cur_alpha = seq2seq.attention_decoder(
        emb_decoder_inputs, self._dec_in_state,  self._enc_top_states,
        cell, num_heads=1, loop_function=loop_function,
        initial_state_attention=initial_state_attention)

训练时，我设置那些保持概率为我使用的值0.5，当计算训练集和验证集的丢失时，我将它们保持为0.5，但在解码步骤中我使用1，它没有丢失任何东西。我是对的吗？

Answer 1

几乎！

计算准确度和验证时，需要手动将keep_probability设置为1.0，这样在评估网络时实际上不会丢弃任何权重值。如果你不这样做，你基本上会错误估算你到目前为止训练网络预测的价值。这肯定会对你的acc / val分数产生负面影响。特别是50％的辍学率。

解码步骤中使用的dropout图层是可选的，应该进行试验。如果您确实使用它，则需要将其设置为1.0以外的值。

回顾一下路人，辍学背后的想法是重置网络权重的随机权重值，以增加神经元不会被错误地修复的概率（或者你喜欢的任何术语），这会导致过度拟合你的网络。请记住，一般来说，我们试图将我们的网络逼近或拟合到一个函数中。由于拟合网络本质上是一个优化问题，我们不得不担心优化到局部最小值（或最大值......取决于图像在你的脑海中的方向）。因此，辍学是一种正规化形式，有助于我们避免过度适应。

如果有人有任何进一步的见解或更正，请发布！

如何在LSTM训练和解码中使用DropoutWrapper

1 个答案: