我一直在尝试查看RNN示例文档,并通过使用输出移动了一个字符的微小莎士比亚语料库来滚动我自己的简单RNN序列到序列。我正在使用sherjilozair的奇妙utils.py加载数据(https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/utils.py),但我的训练运行看起来像这样......
加载预处理文件 ('epoch',0,'亏损',930.27938270568848) ('epoch',1,'亏损',912.94828796386719) ('epoch',2,'亏损',902.99976110458374) ('epoch',3,'loss',902.90720677375793) ('epoch',4,'亏损',902.87029957771301) ('epoch',5,'loss',902.84992623329163) ('epoch',6,'亏损',902.83739829063416) ('epoch',7,'亏损',902.82908940315247) ('epoch',8,'亏损',902.82331037521362) ('epoch',9,'亏损',902.81916546821594) ('epoch',10,'亏损',902.81605243682861) ('epoch',11,'亏损',902.81366014480591)
我期待更加锐利的下降,即使在1000个时代之后,它仍然大致相同。我认为我的代码有问题,但我看不清楚。我已经粘贴了下面的代码,如果有人能快速查看,看看有什么突出的东西我会非常感激,谢谢。
#
# rays second predictor
#
# take basic example and convert to rnn
#
from tensorflow.examples.tutorials.mnist import input_data
import sys
import argparse
import pdb
import tensorflow as tf
from utils import TextLoader
def main(_):
# break
# number of hidden units
lstm_size = 24
# embedding of dimensionality 15 should be ok for characters, 300 for words
embedding_dimension_size = 15
# load data and get vocab size
num_steps = FLAGS.seq_length
data_loader = TextLoader(FLAGS.data_dir, FLAGS.batch_size, FLAGS.seq_length)
FLAGS.vocab_size = data_loader.vocab_size
# placeholder for batches of characters
input_characters = tf.placeholder(tf.int32, [FLAGS.batch_size, FLAGS.seq_length])
target_characters = tf.placeholder(tf.int32, [FLAGS.batch_size, FLAGS.seq_length])
# create cell
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size, state_is_tuple=True)
# initialize with zeros
initial_state = state = lstm.zero_state(FLAGS.batch_size, tf.float32)
# use embedding to convert ints to float array
embedding = tf.get_variable("embedding", [FLAGS.vocab_size, embedding_dimension_size])
inputs = tf.nn.embedding_lookup(embedding, input_characters)
# flatten back to 2-d because rnn cells only deal with 2d
inputs = tf.contrib.layers.flatten(inputs)
# get output and (final) state
outputs, final_state = lstm(inputs, state)
# create softmax layer to classify outputs into characters
softmax_w = tf.get_variable("softmax_w", [lstm_size, FLAGS.vocab_size])
softmax_b = tf.get_variable("softmax_b", [FLAGS.vocab_size])
logits = tf.nn.softmax(tf.matmul(outputs, softmax_w) + softmax_b)
probs = tf.nn.softmax(logits)
# expected labels will be 1-hot representation of last character of target_characters
last_characters = target_characters[:,-1]
last_one_hot = tf.one_hot(last_characters, FLAGS.vocab_size)
# calculate loss
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=last_one_hot, logits=logits)
# calculate total loss as mean across all batches
batch_loss = tf.reduce_mean(cross_entropy)
# train using adam optimizer
train_step = tf.train.AdagradOptimizer(0.3).minimize(batch_loss)
# start session
sess = tf.InteractiveSession()
# initialize variables
sess.run(tf.global_variables_initializer())
# train!
num_epochs = 1000
# loop through epocs
for e in range(num_epochs):
# look through batches
numpy_state = sess.run(initial_state)
total_loss = 0.0
data_loader.reset_batch_pointer()
for i in range(data_loader.num_batches):
this_batch = data_loader.next_batch()
# Initialize the LSTM state from the previous iteration.
numpy_state, current_loss, _ = sess.run([final_state, batch_loss, train_step], feed_dict={initial_state:numpy_state, input_characters:this_batch[0], target_characters:this_batch[1]})
total_loss += current_loss
# output total loss
print("epoch ", e, "loss ", total_loss)
# break into debug
pdb.set_trace()
# calculate accuracy using training set
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--data_dir', type=str, default='data/tinyshakespeare',
help='Directory for storing input data')
parser.add_argument('--batch_size', type=int, default=100,
help='minibatch size')
parser.add_argument('--seq_length', type=int, default=50,
help='RNN sequence length')
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
7月20日更新。
感谢您的回复。我更新了这个以使用动态RNN调用看起来像这样...
outputs, final_state = tf.nn.dynamic_rnn(initial_state=initial_state, cell=lstm, inputs=inputs, dtype=tf.float32)
这提出了一些有趣的问题......批处理似乎可以通过数据集一次拾取50个字符的块,然后向前移动50个字符以获得批处理中的下一个序列。如果这用于训练,并且您根据预测的最终字符对最终字符+ 1计算损失,那么在每个序列中预测总共有49个字符,从不测试损失。这看起来有点奇怪。
此外,在测试输出时,我将其输入一个不是50的单个字符,然后获得预测并将该单个字符反馈回来。我应该每一步都添加到该单个字符吗?所以第一个种子是1个字符,然后我添加预测字符,以便下一个调用顺序为2个字符,等等,直到我的训练序列长度的最大值?或者,如果我回到更新状态,这无关紧要吗?即,更新后的状态是否也代表所有前面的字符?
另一方面,我发现我认为它不是减少的主要原因......我错误地称之为softmax两次......
logits = tf.nn.softmax(tf.matmul(final_output, softmax_w) + softmax_b)
probs = tf.nn.softmax(logits)
答案 0 :(得分:2)
您的函数lstm()
只是一个单元格,而不是一系列单元格。对于序列,您创建一个lstms
序列,然后将序列作为输入传递。通过连接嵌入输入并通过单个单元格无法正常工作,而是使用dynamic_rnn
方法进行序列化。
同样softmax
在logits
以及需要修复的cross_entropy
中应用了两次。