Question

import tensorflow as tf
import tensorflow.contrib.eager as tfe

tfe.enable_eager_execution()

x = tf.range(1, 11, dtype=tf.float32)
x = tf.reshape(x, (5, 1, 2))

cell = tf.contrib.rnn.LSTMCell(10)
initial_state = cell.zero_state(5, dtype=tf.float32)

y1, _ = tf.nn.dynamic_rnn(cell, x, dtype=tf.float32, initial_state=initial_state)

y2, _ = tf.nn.dynamic_rnn(
    tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob=1.0, output_keep_prob=0.5, state_keep_prob=1.0),
    x,
    dtype=tf.float32,
    initial_state=initial_state)

我使用的是Tensorflow 1.8.0。

我希望y2的输出与y1类似，因为y2使用与y1相同的LSTM单元格，除了它通过dropout图层传递同样。由于丢失仅应用于LSTM单元的输出，我认为y2的值将与y1相同，除了这里和那里的几个0。但这是我为y1得到的：

<tf.Tensor: id=5540, shape=(5, 1, 10), dtype=float32, numpy=
array([[[-4.2897560e-02,  1.9367093e-01, -1.1827464e-01, -1.2339889e-01,
          1.3408028e-01,  1.3082971e-02, -2.4622230e-02, -1.5669680e-01,
          1.1127964e-01, -5.3087018e-02]],
       [[-7.1379542e-02,  4.5163053e-01, -1.6180833e-01, -1.3278724e-01,
          2.2819680e-01, -4.8406985e-02, -8.2188733e-03, -2.5466946e-01,
          2.8928292e-01, -7.3916554e-02]],
       [[-5.9056517e-02,  6.1984581e-01, -1.9882108e-01, -9.6297756e-02,
          2.5009862e-01, -8.0139056e-02, -2.2850712e-03, -2.7935350e-01,
          4.4566888e-01, -7.8914449e-02]],
       [[-3.8571563e-02,  6.9930458e-01, -2.2960691e-01, -6.1545946e-02,
          2.5194761e-01, -7.9383254e-02, -5.4560765e-04, -2.7542716e-01,
          5.5587584e-01, -7.3568568e-02]],
       [[-2.2481792e-02,  7.3400390e-01, -2.5636050e-01, -3.7012421e-02,
          2.4684550e-01, -6.3926049e-02, -1.1120128e-04, -2.5999820e-01,
          6.2801009e-01, -6.3132115e-02]]], dtype=float32)>

和y2：

<tf.Tensor: id=5609, shape=(5, 1, 10), dtype=float32, numpy=
array([[[-0.08579512,  0.38734186, -0.23654927, -0.24679779,
          0.        ,  0.02616594, -0.        , -0.3133936 ,
          0.        , -0.        ]],
       [[-0.14275908,  0.        , -0.32361665, -0.26557449,
          0.        , -0.        , -0.        , -0.5093389 ,
          0.        , -0.        ]],
       [[-0.11811303,  0.        , -0.39764217, -0.        ,
          0.50019723, -0.16027811, -0.00457014, -0.        ,
          0.89133775, -0.        ]],
       [[-0.        ,  0.        , -0.45921382, -0.12309189,
          0.        , -0.        , -0.        , -0.        ,
          1.1117517 , -0.14713714]],
       [[-0.        ,  0.        , -0.        , -0.07402484,
          0.        , -0.        , -0.        , -0.5199964 ,
          1.2560202 , -0.        ]]], dtype=float32)>

y2中的非零值与y1中相应位置的值完全不同。

这是一个错误，还是我对LSTM单元输出上的丢失应用有什么错误的想法？

Answer 1

y2 is equivalent to y1_drop/0.5。

当dropout应用于y1时，保留概率为p，然后将输出除以p进行缩放。

如果同时检查两个矩阵，y2只是随机丢弃一半输入，然后将其缩放0.5。

来自Section 10的{{1}}，

“我们将辍学描述为我们保留单位的方法训练时的概率Dropout paper并按比例缩小权重在测试时将它们乘以p因子。另一种方式达到同样的效果是扩大保留的激活量在训练时乘以p而不是在...处修改权重考试时间。这些方法与适当的缩放相当每层的学习率和权重初始化。“

参考： Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Tensorflow：使用和不使用Dropout Wrapper了解LSTM输出

1 个答案: