无法训练简单的Char RNN

时间:2018-05-07 15:10:57

标签: python tensorflow

我一直在使用tensorflow中的vanilla charnnn。即使经过几个小时的培训,我也无法生产任何合理的东西。代码是来自Chollet的深度学习与python Github的Keras代码的版本 我尝试过使用超级障碍而没有太大的成功。 Chollet在书中提到该模型在80个时代之后产生了良好的输出。在50K + epochs之后我能够得到任何合理的东西:(如果在将此代码转换为tensorflow时遗漏了一些东西,我会很奇怪。

n_layers = 1
num_units = 128
batch_size = 150
X = tf.placeholder(tf.float32, [None, maxlen, len(unique_chars)], name="Placeholder_X")
y = tf.placeholder(tf.int64, [None, len(unique_chars)], name="Placeholder_Y")

lstm_cells = [tf.contrib.rnn.BasicLSTMCell(num_units=num_units) for layer in range(n_layers)]
multi_cell = tf.contrib.rnn.MultiRNNCell(lstm_cells)
outputs, current_state = tf.nn.dynamic_rnn(multi_cell, X, dtype=tf.float32)

top_layer_h_state = current_state[-1][1]
logits = tf.layers.dense(top_layer_h_state, len(unique_chars), name="softmax")
xentropy=tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=y)
loss = tf.reduce_mean(xentropy, name="loss")
optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001)
training_op = optimizer.minimize(loss)
pred = tf.nn.softmax(logits)

init = tf.global_variables_initializer()
saver = tf.train.Saver()

采样代码:

with tf.Session() as sess:
    init.run()
    saver.restore(sess, model_name)
    # Output some data
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print("Seed: ", generated_text)
    final_string = ""
    sampled = np.zeros((1, maxlen, len(unique_chars)))

    for i in range(50):
        for t, char in enumerate(generated_text):
            sampled[0, t, char_to_idx[char]] = 1.
        preds_eval = sess.run([pred], feed_dict={X: sampled})
        preds=preds_eval[0][0]            
        next_index = sample(preds, 0.5) 
        next_char = unique_chars[next_index]
        generated_text += next_char
        final_string += next_char
        generated_text = generated_text[1:]
    print("New String: " , final_string)

示例输入种子:是, 正如现在人们普遍承认的那样,没有更好的表现了。

输入生成:

maxlen = 60
step = 3 
sentences = []
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i:i + maxlen])
    next_chars.append(text[i + maxlen])

unique_chars = sorted(list(set(text)))
char_to_idx = dict((char, unique_chars.index(char)) for char in unique_chars)


data_X = np.zeros((len(sentences), maxlen, len(unique_chars)), dtype=np.float32)

data_Y = np.zeros((len(sentences), len(unique_chars)), dtype=np.int64)
for idx, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        data_X[idx, t, char_to_idx[char]] = 1
    data_Y[idx, char_to_idx[next_chars[idx]]] = 1

模特的输出:vatsoéätlæéättire

1 个答案:

答案 0 :(得分:0)

看起来你正在尝试制作语言模型。我没有仔细阅读你的整个代码。从第一部分开始,我注意到了一些事情。为什么x类型为tf.float32而不是整数的占位符?更重要的是,为什么y的形状与词汇大小相等?它应该是由vocab_size通过max_len -1进行的batch_size。在语言模型中,您总是试图预测每一步的下一个角色。它不是训练它来阅读整个人物序列的好方法,然后最后再预测一个。