我一直在使用tensorflow中的vanilla charnnn。即使经过几个小时的培训,我也无法生产任何合理的东西。代码是来自Chollet的深度学习与python Github的Keras代码的版本 我尝试过使用超级障碍而没有太大的成功。 Chollet在书中提到该模型在80个时代之后产生了良好的输出。在50K + epochs之后我能够得到任何合理的东西:(如果在将此代码转换为tensorflow时遗漏了一些东西,我会很奇怪。
n_layers = 1
num_units = 128
batch_size = 150
X = tf.placeholder(tf.float32, [None, maxlen, len(unique_chars)], name="Placeholder_X")
y = tf.placeholder(tf.int64, [None, len(unique_chars)], name="Placeholder_Y")
lstm_cells = [tf.contrib.rnn.BasicLSTMCell(num_units=num_units) for layer in range(n_layers)]
multi_cell = tf.contrib.rnn.MultiRNNCell(lstm_cells)
outputs, current_state = tf.nn.dynamic_rnn(multi_cell, X, dtype=tf.float32)
top_layer_h_state = current_state[-1][1]
logits = tf.layers.dense(top_layer_h_state, len(unique_chars), name="softmax")
xentropy=tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=y)
loss = tf.reduce_mean(xentropy, name="loss")
optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001)
training_op = optimizer.minimize(loss)
pred = tf.nn.softmax(logits)
init = tf.global_variables_initializer()
saver = tf.train.Saver()
采样代码:
with tf.Session() as sess:
init.run()
saver.restore(sess, model_name)
# Output some data
start_index = random.randint(0, len(text) - maxlen - 1)
generated_text = text[start_index: start_index + maxlen]
print("Seed: ", generated_text)
final_string = ""
sampled = np.zeros((1, maxlen, len(unique_chars)))
for i in range(50):
for t, char in enumerate(generated_text):
sampled[0, t, char_to_idx[char]] = 1.
preds_eval = sess.run([pred], feed_dict={X: sampled})
preds=preds_eval[0][0]
next_index = sample(preds, 0.5)
next_char = unique_chars[next_index]
generated_text += next_char
final_string += next_char
generated_text = generated_text[1:]
print("New String: " , final_string)
示例输入种子:是, 正如现在人们普遍承认的那样,没有更好的表现了。
输入生成:
maxlen = 60
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
sentences.append(text[i:i + maxlen])
next_chars.append(text[i + maxlen])
unique_chars = sorted(list(set(text)))
char_to_idx = dict((char, unique_chars.index(char)) for char in unique_chars)
data_X = np.zeros((len(sentences), maxlen, len(unique_chars)), dtype=np.float32)
data_Y = np.zeros((len(sentences), len(unique_chars)), dtype=np.int64)
for idx, sentence in enumerate(sentences):
for t, char in enumerate(sentence):
data_X[idx, t, char_to_idx[char]] = 1
data_Y[idx, char_to_idx[next_chars[idx]]] = 1
模特的输出:vatsoéätlæéättire
答案 0 :(得分:0)
看起来你正在尝试制作语言模型。我没有仔细阅读你的整个代码。从第一部分开始,我注意到了一些事情。为什么x
类型为tf.float32
而不是整数的占位符?更重要的是,为什么y
的形状与词汇大小相等?它应该是由vocab_size通过max_len -1进行的batch_size。在语言模型中,您总是试图预测每一步的下一个角色。它不是训练它来阅读整个人物序列的好方法,然后最后再预测一个。