Question

我目前正在实现此处显示的神经图像字幕模型：

https://www.tensorflow.org/tutorials/text/image_captioning

训练循环将一批句子逐字传递到解码器中。在实施模型时，我注意到上一个时间步长的隐藏状态未馈入GRU。为了确保GRU不会存储上一个时间步的隐藏状态，我编写了自己的玩具示例，发现它确实没有存储隐藏状态：

    input_word = tf.keras.layers.Input(shape=(1,5))
    h = tf.keras.layers.Input(shape=(3,))
    c = tf.keras.layers.Input(shape=(3,))
    out, new_h, new_c = tf.keras.layers.LSTM(3, return_state=True, name='lstm')(input_word) #h and c not passed into LSTM
    out = tf.keras.layers.Dense(4)(out)
    out = tf.keras.layers.Dropout(0.1)(out)
    out = tf.keras.layers.Dense(2)(out)
    model = tf.keras.Model(inputs=[input_word, h, c], outputs=[out, new_h, new_c])

    sequence = tf.convert_to_tensor(np.random.random([1, 3, 5]), dtype=tf.float32)
    c = tf.convert_to_tensor(np.array([[0.1, 0.2, 0.3]]), dtype=tf.float32)
    h = tf.convert_to_tensor(np.array([[0.1, 0.2, 0.3]]), dtype=tf.float32)
    current_input = tf.expand_dims(sequence[:, 0], 1)
    for i in range(1, sequence.shape[1]):
        print(model.get_layer('lstm').states)
        predictions, h, c = model([current_input[: i], h, c])
        print(model.get_layer('lstm').states)
        current_input = tf.expand_dims(sequence[:, i], 1)

打印状态语句返回[None, None]。

我很肯定这是我缺乏理解，而不是Google犯了一个错误。但是，如果有人可以向我解释GRU如何知道上一个时间步长的隐藏状态，我将不胜感激。

非常感谢您！

LSTM的自定义训练循环（Tensorflow 2）

0 个答案: