Tensorflow dynamic_rnn传播批量大于1的nans

时间:2018-06-05 23:19:11

标签: tensorflow lstm recurrent-neural-network

希望有人能帮助我理解我在Tensorflow中使用带有dynamic_rnn的LSTM时遇到的问题。根据这个MWE,当我的批量大小为1且序列不完整时(我填充短张量的纳米而不是零以突出显示)一切都正常运行,纳米在如预期的那样忽略短序列......

import tensorflow as tf
import numpy as np

batch_1 = np.random.randn(1, 10, 8)
batch_2 = np.random.randn(1, 10, 8)

batch_1[6:] = np.nan # lets make a short batch in batch 1 second sample of length 6 by padding with nans

seq_lengths_batch_1 = [6]
seq_lengths_batch_2 = [10]

tf.reset_default_graph()

input_vals = tf.placeholder(shape=[1, 10, 8], dtype=tf.float32)
lengths = tf.placeholder(shape=[1], dtype=tf.int32)

cell = tf.nn.rnn_cell.LSTMCell(num_units=5)
outputs, states  = tf.nn.dynamic_rnn(cell=cell, dtype=tf.float32, sequence_length=lengths, inputs=input_vals)
last_relevant_value = states.h
fake_loss = tf.reduce_mean(last_relevant_value)
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(fake_loss)

sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
_, fl, lrv = sess.run([optimizer, fake_loss, last_relevant_value], feed_dict={input_vals: batch_1, lengths: seq_lengths_batch_1})
print(fl, lrv)
_, fl, lrv = sess.run([optimizer, fake_loss, last_relevant_value], feed_dict={input_vals: batch_2, lengths: seq_lengths_batch_2})
print(fl, lrv)

sess.close()

输出适当填充的值....

0.00659429 [[ 0.11608966  0.08498846 -0.02892204 -0.01945034 -0.1197343 ]]
-0.080244 [[-0.03018401 -0.18946587 -0.19128899 -0.10388547  0.11360413]]

然而,当我将批量大小增加到3级时,第一批正确执行,但不知何故第二批导致nans开始传播

import tensorflow as tf
import numpy as np

batch_1 = np.random.randn(3, 10, 8)
batch_2 = np.random.randn(3, 10, 8)

batch_1[1, 6:] = np.nan 
batch_2[0, 8:] = np.nan 

seq_lengths_batch_1 = [10, 6, 10]
seq_lengths_batch_2 = [8, 10, 10]

tf.reset_default_graph()

input_vals = tf.placeholder(shape=[3, 10, 8], dtype=tf.float32)
lengths = tf.placeholder(shape=[3], dtype=tf.int32)

cell = tf.nn.rnn_cell.LSTMCell(num_units=5)
outputs, states  = tf.nn.dynamic_rnn(cell=cell, dtype=tf.float32, sequence_length=lengths, inputs=input_vals)
last_relevant_value = states.h
fake_loss = tf.reduce_mean(last_relevant_value)
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(fake_loss)

sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
_, fl, lrv = sess.run([optimizer, fake_loss, last_relevant_value], feed_dict={input_vals: batch_1, lengths: seq_lengths_batch_1})
print(fl, lrv)
_, fl, lrv = sess.run([optimizer, fake_loss, last_relevant_value], feed_dict={input_vals: batch_2, lengths: seq_lengths_batch_2})
print(fl, lrv)

sess.close()

0.0533635 [[ 0.33622459 -0.0284576   0.11914439  0.14402215 -0.20783389]
 [ 0.20805927  0.17591488 -0.24977767 -0.03432769  0.2944448 ]
 [-0.04508523  0.11878576  0.07287208  0.14114542 -0.24467923]]
nan [[ nan  nan  nan  nan  nan]
 [ nan  nan  nan  nan  nan]
 [ nan  nan  nan  nan  nan]]

我发现这种行为很奇怪,因为我预计序列长度之后的所有值都会被忽略,批量大小为1,但批量大小不超过2。

显然,如果我使用0作为填充值,nans就不会被传播,但是这并没有让我充满信心,因为我有信心dynamic_rnn正在发挥作用。

另外我应该提一下,如果我删除了优化步骤,问题就不会发生,所以现在我感到很困惑,经过一天尝试许多不同的排列后,我不知道为什么批量大小会在这里产生任何影响

1 个答案:

答案 0 :(得分:1)

我没有将其追溯到确切的操作,但这就是我认为的情况。

为什么忽略超出sequence_length的值?在执行某些操作时,它们会被0(它们被屏蔽掉)的意义上的忽略忽略。在数学上,结果总是为零,所以它们应该没有效果。不幸的是,nan * 0 = nan。因此,如果您在示例中提供nan值,则会传播它们。您可能想知道为什么TensorFlow不会完全忽略它们,而只是掩盖它们。原因是现代硬件的性能。使用一堆零而不是几个小形状(通过分解不规则形状得到的)在大型规则形状上进行操作要容易得多。

为什么它只发生在第二批?在第一批中,使用原始变量值计算损失和最后隐藏状态。他们很好。因为您还在sess.run()中执行了优化程序更新,所以变量会在第一次调用中更新并变为nan。在第二次调用中,来自变量的nan s扩展为丢失和隐藏状态。

我如何确信sequence_length以外的值真的被掩盖了?我修改了您的示例以重现问题,但也使其成为确定性的。

import tensorflow as tf
import numpy as np

batch_1 = np.ones((3, 10, 2))

batch_1[1, 7:] = np.nan

seq_lengths_batch_1 = [10, 7, 10]

tf.reset_default_graph()

input_vals = tf.placeholder(shape=[3, 10, 2], dtype=tf.float32)
lengths = tf.placeholder(shape=[3], dtype=tf.int32)

cell = tf.nn.rnn_cell.LSTMCell(num_units=3, initializer=tf.constant_initializer(1.0))
init_state = tf.nn.rnn_cell.LSTMStateTuple(*[tf.ones([3, c]) for c in cell.state_size])
outputs, states  = tf.nn.dynamic_rnn(cell=cell, dtype=tf.float32, sequence_length=lengths, inputs=input_vals,
        initial_state=init_state)
last_relevant_value = states.h
fake_loss = tf.reduce_mean(last_relevant_value)
optimizer = tf.train.AdamOptimizer(learning_rate=0.1).minimize(fake_loss)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for _ in range(1):
        _, fl, lrv = sess.run([optimizer, fake_loss, last_relevant_value],
                feed_dict={input_vals: batch_1, lengths: seq_lengths_batch_1})
        print "VARIABLES:", sess.run(tf.trainable_variables())
        print "LOSS and LAST HIDDEN:", fl, lrv

如果您将np.nan中的batch_1[1, 7:] = np.nan替换为任意数字(例如,尝试-1M,1M,0),您会看到所获得的值相同。您还可以运行循环以进行更多迭代。作为进一步的健全性检查,如果您将seq_lengths_batch_1设置为“错误”,例如[10,8,10],您可以看到现在batch_1[1, 7:] = np.nan中使用的值会影响输出。