Tensorflow / Keras,嵌入和sparse_categorical_crossentropy问题

时间:2019-05-28 20:18:53

标签: tensorflow keras

Keras noob在这里,

我正在尝试建立一个LSTM网络以使用莎士比亚的作品生成文本。 (有点像this教程中的内容)

这是生成我的模型的方法:

def generate_model(seq_len=100, stateful=True):
    # Initialize model
    source = tf.keras.Input(
        name='seed', shape=(seq_len,), dtype=tf.int32)

    # Embed ascii character (0 to 255) into one hot encoding (0, 1, 0...)
    embedding = tf.keras.layers.Embedding(input_dim=256, output_dim=EMBEDDING_DIM, input_length=seq_len)(source)

    # Good old LSTM's
    lstm_1 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(embedding)
    lstm_2 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_1)

    # I honestly don't understand what the TimeDistributed method does
    predicted_char = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(256, activation='softmax'))(lstm_2)

    model = tf.keras.Model(inputs=[source], outputs=[predicted_char])

    model.compile(
        optimizer=tf.train.RMSPropOptimizer(learning_rate=0.01),
        loss='categorical_crossentropy',
        metrics=['categorical_accuracy'])
    return model

我认为嵌入层负责将所有字符一键编码为矢量。如果是这样,那么在通过LSTM层时是否不会保留该结构?我有点困惑。

作为参考,这是输入的示例: [65 76 76 83 32 87 69 76 76 32]

(在编码之前) ['A', 'L', 'L', 'S', ' ', 'W', 'E', 'L', 'L', ' ']

相应的标签是序列中的下一个字符: [84],即['T']

我正在努力应对新人中常见的错误

tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [100,256] and labels shape [1]
     [[{{node loss/time_distributed_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
     [[{{node training/TFOptimizer/gradients/embedding/embedding_lookup_grad/Reshape}}]]

从我所做的研究来看,这似乎与我使用 sparse_categorical_crossentropy 有关,但是,如果我使用 categorical_crossentropy ,则会出现以下错误在训练期间

ValueError: You are passing a target array of shape (5524624, 1) while using as loss `categorical_crossentropy`. `categorical_crossentropy` expects targets to be binary matrices (1s and 0s) of shape (samples, classes). If your targets are integer classes, you can convert them to the expected format via:

我一定会丢失一些东西,如何训练我的模型?

谢谢,非常感谢您的帮助:))

编辑: 如果有用的话,这就是我的全部代码:

import numpy as np
import tensorflow as tf


SHAKESPEARE_TXT = 'shakespeare.txt'

with open(SHAKESPEARE_TXT, 'r', encoding="utf8") as f:
    raw = f.read()


def encode(txt):
    # drop any non-ascii characters
    output = np.asarray([ord(c) for c in txt if ord(c) < 255 and c != '\r'], dtype=np.int32)

    return output

def decode(txt):
    return [chr(c) for c in txt]

def get_training_data(seq_len, txt=raw):
    source = encode(txt)

    x, y = [], []

    n = len(source) - seq_len
    #n=100

    for i in range(n):
        sequence = source[i: i + seq_len]

        x.append(sequence)

        y.append([source[i + seq_len]])

    return np.asarray(x), np.asarray(y)





# txt = encode(raw)
# print(decode(txt[0:100]))
'''
training_data = get_training_data(seq_len=10)

for i in range(10):
    print(decode(training_data[0][i]), decode(training_data[1][i]))
'''


EMBEDDING_DIM = 512

def generate_model(seq_len=100, stateful=True):
    # Initialize model
    source = tf.keras.Input(
        name='seed', shape=(seq_len,), dtype=tf.int32)

    # Embed ascii character (0 to 255) into one hot encoding (0, 1, 0...)
    embedding = tf.keras.layers.Embedding(input_dim=256, output_dim=EMBEDDING_DIM, input_length=seq_len)(source)

    # Good old LSTM's
    lstm_1 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(embedding)
    lstm_2 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_1)

    # I honestly don't understand what the TimeDistributed method does
    predicted_char = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(256, activation='softmax'))(lstm_2)

    model = tf.keras.Model(inputs=[source], outputs=[predicted_char])

    model.compile(
        optimizer=tf.train.RMSPropOptimizer(learning_rate=0.01),
        loss='categorical_crossentropy',
        metrics=['categorical_accuracy'])
    return model


def train():
    tf.keras.backend.clear_session()

    print("Creating model")

    training_model = generate_model(seq_len=100, stateful=False)

    '''
    tpu_model = tf.contrib.tpu.keras_to_tpu_model(
        training_model,
        strategy=tf.contrib.tpu.TPUDistributionStrategy(
            tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
    '''

    print("Training")

    data = get_training_data(seq_len=100)
    '''
    print(data[0].shape)
    print(data[0][0])
    print(data[1].shape)
    print(data[1][0])
    '''

    # Start training
    training_model.fit(
        x=data[0],
        y=data[1],
        batch_size=1,
        # steps_per_epoch=100,
        epochs=2
    )

    print("Saving")

    training_model.save_weights('/tmp/bard.h5', overwrite=True)


train()

0 个答案:

没有答案