我正在尝试修改https://www.tensorflow.org/tutorials/sequences/text_generation上的示例以生成基于字符的文本。
示例中的代码使用Tensorflow Eager Execution(通过tensorflow.enable_eager_execution
)并运行良好,但是如果我禁用急切的执行,则会开始收到此错误:
检查目标时出错:预期密度为3维,但数组的形状为(32,200)
为什么会这样?在启用或不启用Eager的情况下,代码是否应该完全相同?
我尝试展平LSTM层的输出,但出现类似错误:
ValueError:检查目标时出错:预期密度为形状(1,),但形状为(200,)的阵列
我能做的最简单的代码如下:
import tensorflow as tf
import numpy as np
# tf.enable_eager_execution()
def get_input():
path_to_file = tf.keras.utils.get_file(
'shakespeare.txt',
'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt'
)
with open(path_to_file) as f:
text = f.read()
return text
def get_dataset(text_as_indexes, sequence_size, sequences_per_batch):
def split_input(sequence):
return sequence[:-1], sequence[1:]
data_set = tf.data.Dataset.from_tensor_slices(text_as_indexes)
data_set = data_set.batch(sequence_size + 1, drop_remainder=True)
data_set = data_set.map(split_input)
data_set = data_set.shuffle(10000).batch(sequences_per_batch, drop_remainder=True)
return data_set
if __name__ == '__main__':
sequences_len = 200
batch_size = 32
embeddings_size = 64
rnn_units = 128
text = get_input()
vocab = sorted(set(text))
vocab_size = len(vocab)
char2int = {c: i for i, c in enumerate(vocab)}
int2char = np.array(vocab)
text_as_int = np.array([char2int[c] for c in text])
dataset = get_dataset(text_as_int, sequences_len, batch_size)
steps_per_epoch = len(text_as_int) // sequences_len // batch_size
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Embedding(
input_dim=vocab_size,
output_dim=embeddings_size,
input_length=sequences_len))
model.add(tf.keras.layers.LSTM(
units=rnn_units,
return_sequences=True))
model.add(tf.keras.layers.Dense(units=vocab_size, activation='softmax'))
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy')
model.summary()
model.fit(
x=dataset.repeat(),
batch_size=batch_size,
steps_per_epoch=steps_per_epoch)
答案 0 :(得分:1)
使用sparse_categorical_crossentropy
时,标签的形状应为(batch_size, sequence_length, 1)
,而不是(batch_size, sequence_length)
。您可以通过解决此问题
重塑split_input()
函数中的标签,如下所示:
def split_input(sequence):
return sequence[:-1], tf.reshape(sequence[1:], (-1,1))
上面的代码适用于急切执行和正常执行。