Keras noob在这里,
我正在尝试建立一个LSTM网络以使用莎士比亚的作品生成文本。 (有点像this教程中的内容)
这是生成我的模型的方法:
def generate_model(seq_len=100, stateful=True):
# Initialize model
source = tf.keras.Input(
name='seed', shape=(seq_len,), dtype=tf.int32)
# Embed ascii character (0 to 255) into one hot encoding (0, 1, 0...)
embedding = tf.keras.layers.Embedding(input_dim=256, output_dim=EMBEDDING_DIM, input_length=seq_len)(source)
# Good old LSTM's
lstm_1 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(embedding)
lstm_2 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_1)
# I honestly don't understand what the TimeDistributed method does
predicted_char = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(256, activation='softmax'))(lstm_2)
model = tf.keras.Model(inputs=[source], outputs=[predicted_char])
model.compile(
optimizer=tf.train.RMSPropOptimizer(learning_rate=0.01),
loss='categorical_crossentropy',
metrics=['categorical_accuracy'])
return model
我认为嵌入层负责将所有字符一键编码为矢量。如果是这样,那么在通过LSTM层时是否不会保留该结构?我有点困惑。
作为参考,这是输入的示例:
[65 76 76 83 32 87 69 76 76 32]
(在编码之前)
['A', 'L', 'L', 'S', ' ', 'W', 'E', 'L', 'L', ' ']
相应的标签是序列中的下一个字符:
[84]
,即['T']
我正在努力应对新人中常见的错误
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [100,256] and labels shape [1]
[[{{node loss/time_distributed_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
[[{{node training/TFOptimizer/gradients/embedding/embedding_lookup_grad/Reshape}}]]
从我所做的研究来看,这似乎与我使用 sparse_categorical_crossentropy 有关,但是,如果我使用 categorical_crossentropy ,则会出现以下错误在训练期间
ValueError: You are passing a target array of shape (5524624, 1) while using as loss `categorical_crossentropy`. `categorical_crossentropy` expects targets to be binary matrices (1s and 0s) of shape (samples, classes). If your targets are integer classes, you can convert them to the expected format via:
我一定会丢失一些东西,如何训练我的模型?
谢谢,非常感谢您的帮助:))
编辑: 如果有用的话,这就是我的全部代码:
import numpy as np
import tensorflow as tf
SHAKESPEARE_TXT = 'shakespeare.txt'
with open(SHAKESPEARE_TXT, 'r', encoding="utf8") as f:
raw = f.read()
def encode(txt):
# drop any non-ascii characters
output = np.asarray([ord(c) for c in txt if ord(c) < 255 and c != '\r'], dtype=np.int32)
return output
def decode(txt):
return [chr(c) for c in txt]
def get_training_data(seq_len, txt=raw):
source = encode(txt)
x, y = [], []
n = len(source) - seq_len
#n=100
for i in range(n):
sequence = source[i: i + seq_len]
x.append(sequence)
y.append([source[i + seq_len]])
return np.asarray(x), np.asarray(y)
# txt = encode(raw)
# print(decode(txt[0:100]))
'''
training_data = get_training_data(seq_len=10)
for i in range(10):
print(decode(training_data[0][i]), decode(training_data[1][i]))
'''
EMBEDDING_DIM = 512
def generate_model(seq_len=100, stateful=True):
# Initialize model
source = tf.keras.Input(
name='seed', shape=(seq_len,), dtype=tf.int32)
# Embed ascii character (0 to 255) into one hot encoding (0, 1, 0...)
embedding = tf.keras.layers.Embedding(input_dim=256, output_dim=EMBEDDING_DIM, input_length=seq_len)(source)
# Good old LSTM's
lstm_1 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(embedding)
lstm_2 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_1)
# I honestly don't understand what the TimeDistributed method does
predicted_char = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(256, activation='softmax'))(lstm_2)
model = tf.keras.Model(inputs=[source], outputs=[predicted_char])
model.compile(
optimizer=tf.train.RMSPropOptimizer(learning_rate=0.01),
loss='categorical_crossentropy',
metrics=['categorical_accuracy'])
return model
def train():
tf.keras.backend.clear_session()
print("Creating model")
training_model = generate_model(seq_len=100, stateful=False)
'''
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
training_model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
'''
print("Training")
data = get_training_data(seq_len=100)
'''
print(data[0].shape)
print(data[0][0])
print(data[1].shape)
print(data[1][0])
'''
# Start training
training_model.fit(
x=data[0],
y=data[1],
batch_size=1,
# steps_per_epoch=100,
epochs=2
)
print("Saving")
training_model.save_weights('/tmp/bard.h5', overwrite=True)
train()