我的RNN中的张量流量损失正在发散

时间:2016-08-04 08:22:09

标签: python neural-network tensorflow sequence deep-learning

我正试图通过解决这一挑战来解决Tensorflow:https://www.kaggle.com/c/integer-sequence-learning

我的工作基于这些博文:

可以在此处找到完整的工作示例 - 包含我的数据:https://github.com/bottiger/Integer-Sequence-Learning运行该示例将打印出大量调试信息。运行execute rnn-lstm-my.py。 (需要张量流和熊猫)

这种方法很简单。我加载了所有的火车序列,将它们的长度存储在一个向量中,并将长度最长的一个存储在变量中,我称之为“max_length”。

在我的训练数据中,我删除了所有序列中的最后一个元素,并将其存储在名为“train_solutions”的向量中

我将所有序列(用零填充)存储在一个形状为矩阵的矩阵中:[n_seq,max_length]。

由于我想预测序列中的下一个数字,我的输出应该是一个数字,我的输入应该是一个序列。

我使用带有BasicLSTMCell作为单元格的RNN(tf.nn.rnn),具有24个隐藏单位。输出被输入基本线性模型(xW + B),这应该产生我的预测。

我的成本函数只是模型的预测数量,我计算这样的成本:

    cost = tf.nn.l2_loss(tf_result - prediction)

基础维度似乎是正确的,因为代码实际运行。然而,经过一两次迭代后,一些NaN开始出现,迅速扩散,一切都变成了NaN。

以下是我定义和运行图表的代码的重要部分。但是,我已经省略了发布的加载/准备数据。请查看git repo以获取相关详细信息 - 但我非常确定该部分是正确的。

cell = tf.nn.rnn_cell.BasicLSTMCell(num_hidden, state_is_tuple=True)

num_inputs = tf.placeholder(tf.int32, name='NumInputs')
seq_length = tf.placeholder(tf.int32, shape=[batch_size], name='NumInputs')

# Define the input as a list (num elements = batch_size) of sequences
inputs = [tf.placeholder(tf.float32,shape=[1, max_length], name='InputData') for _ in range(batch_size)]

# Result should be 1xbatch_szie vector
result = tf.placeholder(tf.float32, shape=[batch_size, 1], name='OutputData')

tf_seq_length = tf.Print(seq_length, [seq_length, seq_length.get_shape()], 'SequenceLength: ')

outputs, states = tf.nn.rnn(cell, inputs, dtype=tf.float32) 

# Print the output. The NaN first shows up here
outputs2 = tf.Print(outputs, [outputs], 'Last: ', name="Last", summarize=800)

# Define the model
tf_weight = tf.Variable(tf.truncated_normal([batch_size, num_hidden, frame_size]), name='Weight')
tf_bias   = tf.Variable(tf.constant(0.1, shape=[batch_size]), name='Bias')

# Debug the model parameters
weight = tf.Print(tf_weight, [tf_weight, tf_weight.get_shape()], "Weight: ")
bias = tf.Print(tf_bias, [tf_bias, tf_bias.get_shape()], "bias: ")

# More debug info
print('bias: ', bias.get_shape())
print('weight: ', weight.get_shape())
print('targets ', result.get_shape())
print('RNN input ', type(inputs))
print('RNN input len()', len(inputs))
print('RNN input[0] ', inputs[0].get_shape())

# Calculate the prediction
tf_prediction = tf.batch_matmul(outputs2, weight) + bias
prediction = tf.Print(tf_prediction, [tf_prediction, tf_prediction.get_shape()], 'prediction: ')

tf_result = result

# Calculate the cost
cost = tf.nn.l2_loss(tf_result - prediction)

#optimizer = tf.train.AdamOptimizer()
learning_rate  = 0.05
optimizer = tf.train.GradientDescentOptimizer(learning_rate)


minimize = optimizer.minimize(cost)

mistakes = tf.not_equal(tf.argmax(result, 1), tf.argmax(prediction, 1))
error = tf.reduce_mean(tf.cast(mistakes, tf.float32))

init_op = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init_op)

no_of_batches = int(len(train_input)) / batch_size
epoch = 1

val_dict = get_input_dict(val_input, val_output, train_length, inputs, batch_size)

for i in range(epoch):
    ptr = 0
    for j in range(no_of_batches):

    print('eval w: ', weight.eval(session=sess))

    # inputs batch
    t_i = train_input[ptr:ptr+batch_size]

    # output batch
    t_o = train_output[ptr:ptr+batch_size]

    # sequence lengths
    t_l = train_length[ptr:ptr+batch_size]

    sess.run(minimize,feed_dict=get_input_dict(t_i, t_o, t_l, inputs, batch_size))

    ptr += batch_size

    print("result: ", tf_result)
    print("result len: ", tf_result.get_shape())
    print("prediction: ", prediction)
    print("prediction len: ", prediction.get_shape())


    c_val = sess.run(error, feed_dict = val_dict )
    print "Validation cost: {}, on Epoch {}".format(c_val,i)


    print "Epoch ",str(i)

print('test input: ', type(test_input))
print('test output: ', type(test_output))

incorrect = sess.run(error,get_input_dict(test_input, test_output, test_length, inputs, batch_size))

sess.close()

这是它产生的输出(的第一行)。您可以看到所有内容都变为NaN:http://pastebin.com/TnFFNFrr(由于身体限制,我无法在此处发布)

我第一次看到NaN在这里:

  

I tensorflow / core / kernels / logging_ops.cc:79]最后:[0 0.76159418 0 0 0   0 0 -0.76159418 0 -0.76159418 0 0 0 0.76159418 0.76159418 0   -0.76159418 0.76159418 0 0 0 0.76159418 0 0 0 nan nan nan nan 0 0 nan nan 1 0 nan 0 0.76159418 nan nan nan 1 0 nan 0 0.76159418 nan nan -nan   -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan nan nan nan nan纳米纳米纳米南纳米南纳南纳南南南南南南   nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan   -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan-nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan   纳米南nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan   -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan   纳米南南南楠纳米nannan南南南nan南南南   -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan-nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan南南南]

我希望我能清楚地解决问题。提前致谢

2 个答案:

答案 0 :(得分:3)

RNN受到爆炸梯度的影响,因此您应该剪切RNN参数的渐变。看看这篇文章:

How to effectively apply gradient clipping in tensor flow?

答案 1 :(得分:0)

使用AdamOptimizer而不是

optimizer = tf.train.AdamOptimizer()