我的第一次LSTM RNN损失没有按预期减少

时间:2017-07-14 03:23:23

标签: python tensorflow artificial-intelligence recurrent-neural-network

我一直在尝试查看RNN示例文档,并通过使用输出移动了一个字符的微小莎士比亚语料库来滚动我自己的简单RNN序列到序列。我正在使用sherjilozair的奇妙utils.py加载数据(https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/utils.py),但我的训练运行看起来像这样......

加载预处理文件 ('epoch',0,'亏损',930.27938270568848) ('epoch',1,'亏损',912.94828796386719) ('epoch',2,'亏损',902.99976110458374) ('epoch',3,'loss',902.90720677375793) ('epoch',4,'亏损',902.87029957771301) ('epoch',5,'loss',902.84992623329163) ('epoch',6,'亏损',902.83739829063416) ('epoch',7,'亏损',902.82908940315247) ('epoch',8,'亏损',902.82331037521362) ('epoch',9,'亏损',902.81916546821594) ('epoch',10,'亏损',902.81605243682861) ('epoch',11,'亏损',902.81366014480591)

我期待更加锐利的下降,即使在1000个时代之后,它仍然大致相同。我认为我的代码有问题,但我看不清楚。我已经粘贴了下面的代码,如果有人能快速查看,看看有什么突出的东西我会非常感激,谢谢。

#
# rays second predictor
#
# take basic example and convert to rnn
#

from tensorflow.examples.tutorials.mnist import input_data

import sys
import argparse
import pdb
import tensorflow as tf

from utils import TextLoader

def main(_):
    # break

    # number of hidden units
    lstm_size = 24

    # embedding of dimensionality 15 should be ok for characters, 300 for words
    embedding_dimension_size = 15

    # load data and get vocab size
    num_steps = FLAGS.seq_length
    data_loader = TextLoader(FLAGS.data_dir, FLAGS.batch_size, FLAGS.seq_length)
    FLAGS.vocab_size = data_loader.vocab_size

    # placeholder for batches of characters
    input_characters = tf.placeholder(tf.int32, [FLAGS.batch_size, FLAGS.seq_length])
    target_characters = tf.placeholder(tf.int32, [FLAGS.batch_size, FLAGS.seq_length])

    # create cell
    lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size, state_is_tuple=True)

    # initialize with zeros
    initial_state = state = lstm.zero_state(FLAGS.batch_size, tf.float32)

    # use embedding to convert ints to float array
    embedding = tf.get_variable("embedding", [FLAGS.vocab_size, embedding_dimension_size])
    inputs = tf.nn.embedding_lookup(embedding, input_characters)

    # flatten back to 2-d because rnn cells only deal with 2d
    inputs = tf.contrib.layers.flatten(inputs)

    # get output and (final) state
    outputs, final_state = lstm(inputs, state)

    # create softmax layer to classify outputs into characters
    softmax_w = tf.get_variable("softmax_w", [lstm_size, FLAGS.vocab_size])
    softmax_b = tf.get_variable("softmax_b", [FLAGS.vocab_size])
    logits = tf.nn.softmax(tf.matmul(outputs, softmax_w) + softmax_b)
    probs = tf.nn.softmax(logits)

    # expected labels will be 1-hot representation of last character of target_characters
    last_characters = target_characters[:,-1]
    last_one_hot = tf.one_hot(last_characters, FLAGS.vocab_size)

    # calculate loss
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=last_one_hot, logits=logits)

    # calculate total loss as mean across all batches
    batch_loss = tf.reduce_mean(cross_entropy)

    # train using adam optimizer
    train_step = tf.train.AdagradOptimizer(0.3).minimize(batch_loss)

    # start session
    sess = tf.InteractiveSession()

    # initialize variables
    sess.run(tf.global_variables_initializer())

    # train!
    num_epochs = 1000
    # loop through epocs
    for e in range(num_epochs):
        # look through batches
        numpy_state = sess.run(initial_state)
        total_loss = 0.0
        data_loader.reset_batch_pointer()
        for i in range(data_loader.num_batches):
            this_batch = data_loader.next_batch()
                # Initialize the LSTM state from the previous iteration.
            numpy_state, current_loss, _ = sess.run([final_state, batch_loss, train_step], feed_dict={initial_state:numpy_state, input_characters:this_batch[0], target_characters:this_batch[1]})
            total_loss += current_loss
        # output total loss
        print("epoch ", e, "loss ", total_loss)

    # break into debug
    pdb.set_trace()

    # calculate accuracy using training set

if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument('--data_dir', type=str, default='data/tinyshakespeare',
                      help='Directory for storing input data')
  parser.add_argument('--batch_size', type=int, default=100,
                      help='minibatch size')
  parser.add_argument('--seq_length', type=int, default=50,
                      help='RNN sequence length')
  FLAGS, unparsed = parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

7月20日更新。

感谢您的回复。我更新了这个以使用动态RNN调用看起来像这样...

outputs, final_state = tf.nn.dynamic_rnn(initial_state=initial_state, cell=lstm, inputs=inputs, dtype=tf.float32)

这提出了一些有趣的问题......批处理似乎可以通过数据集一次拾取50个字符的块,然后向前移动50个字符以获得批处理中的下一个序列。如果这用于训练,并且您根据预测的最终字符对最终字符+ 1计算损失,那么在每个序列中预测总共有49个字符,从不测试损失。这看起来有点奇怪。

此外,在测试输出时,我将其输入一个不是50的单个字符,然后获得预测并将该单个字符反馈回来。我应该每一步都添加到该单个字符吗?所以第一个种子是1个字符,然后我添加预测字符,以便下一个调用顺序为2个字符,等等,直到我的训练序列长度的最大值?或者,如果我回到更新状态,这无关紧要吗?即,更新后的状态是否也代表所有前面的字符?

另一方面,我发现我认为它不是减少的主要原因......我错误地称之为softmax两次......

logits = tf.nn.softmax(tf.matmul(final_output, softmax_w) + softmax_b)
probs = tf.nn.softmax(logits)

1 个答案:

答案 0 :(得分:2)

您的函数lstm()只是一个单元格,而不是一系列单元格。对于序列,您创建一个lstms序列,然后将序列作为输入传递。通过连接嵌入输入并通过单个单元格无法正常工作,而是使用dynamic_rnn方法进行序列化。

同样softmaxlogits以及需要修复的cross_entropy中应用了两次。