测试中的丢失会导致LSTM性能失败

时间:2016-09-20 15:08:23

标签: tensorflow

我遇到TensorFlow问题,当我使用输入丢失时,我的LSTM的性能急剧下降(从70%降至<10%)。

据我了解,我应该在训练期间将input_keep_probability设置为(例如)0.5,然后在测试期间将其设置为1。这很有道理,但我无法按预期工作。如果我在测试期间设置丢失与训练期间相同,我的性能会提高(下面的示例中不是这种情况,但这需要的代码更少,而且点数相同)。

The accuracy and cost of 3 runs

最好的&#39; line没有丢失,最差的行是[keep_prob @ train:0.5,keep_prob @ test:1],中心行是[keep_prob @ train:0.5,keep_prob @ test:0.5]。这些是测试装置的成本和准确性。它们都在火车上表现得如预期的那样。

以下是我认为必不可少的代码。遗憾的是,由于其敏感性,我无法发布完整的代码或数据样本,但如果您需要更多信息,请发表评论。

lstm_size = 512
numLayers = 4
numSteps = 15

lstm_cell = tf.nn.rnn_cell.LSTMCell(lstm_size, state_is_tuple=True, forget_bias=1)
lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, input_keep_prob=input_keep_prob)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * numLayers, state_is_tuple=True)

_inputs = [tf.squeeze(s, [1]) for s in tf.split(1, numSteps, input_data)]
(outputs, state) = rnn.rnn(cell, _inputs, dtype=tf.float32)
outputs = tf.pack(outputs)

#transpose so I can put all timesteps through the softmax at once
outputsTranspose = tf.reshape(outputs, [-1, lstm_size])

softmax_w = tf.get_variable("softmax_w", [lstm_size, nof_classes])                         
softmax_b = tf.get_variable("softmax_b", [nof_classes]) 

logits = tf.matmul(outputsTranspose, softmax_w) + softmax_b

loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, targets)
cost = tf.reduce_mean(loss)

targetPrediction = tf.argmax(logits, 1)
accuracy = tf.reduce_mean(tf.cast(tf.equal(targetPrediction, targets), "float"))

"""Optimizer"""
with tf.name_scope("Optimizer") as scope:
    tvars = tf.trainable_variables()

    #We clip the gradients to prevent explosion
    grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars),maxGradNorm)
    optimizer = tf.train.AdamOptimizer(learning_rate)
    gradients = zip(grads, tvars)
    train_op = optimizer.apply_gradients(gradients)


with tf.Session() as sess:
    sess.run(init_op);

    for i in range(nofBatches * nofEpochs):
        example_batch, label_batch = sess.run(readTrainDataOp)

        result = sess.run([train_op, accuracy, trainSummaries], feed_dict = {input_data: example_batch, targets: label_batch, input_keep_prob:trainInputKeepProbability, batch_size_ph:batch_size})
        #logging

        if i % 50 == 0:
            runTestSet()
            #relevant part of runTestSet(): 
            #result = sess.run([cost, accuracy], feed_dict = {input_data: testData, targets: testLabels, input_keep_prob:testInputKeepProbability, batch_size_ph:testBatchSize})
            #logging

我做错了什么造成了这种意想不到的行为?

编辑:以下是输入样本的图像请参阅下一个链接。

这个问题也只有1层。

编辑:I made an example that reproduces the problem。只需运行python脚本,将test_samples.npy的路径作为第一个参数,将检查点的路径作为第二个参数。

1 个答案:

答案 0 :(得分:0)

尝试input_keep_prob = output_keep_prob = 0.7进行培训 和1.0保持测试的可能性

0.5保持概率不适用于我的LSTM