Question

我正在尝试使用 TensorFlow 和 Keras 实现堆叠 GRU，类似于 PyTorch 实现 torch.nn.GRU()。

PyTorch 实现如下所示：

import torch.nn
import torch
    
class stacked_GRU(torch.Model):
  def __init__(self, inp_dimension, gru_layers, gru_neurons):
    super(stacked_GRU, self).__init__()
    self.inp_dimension = inp_dimension
        
    self.rnn = nn.GRU(
      input_size= inp_dimension,
      hidden_size = gru_neurons,
      num_layers = gru_layers 
    )

  def forward(self, x, hidden):
    output, hidden = self.rnn(x, hidden)
    return output, hidden

在这里，我们将 gru_layers 个 GRU 堆叠在一起。如果我们有 seq_len 个时间步长的输入和 batch 的批量大小，那么输出将是 (seq_len, batch, gru_neurons) 的形状。隐藏状态是每个堆叠层的最后一个隐藏输出，因此形状为(gru_layers, batch, gru_neurons)。

现在如果我想在 Keras 中做同样的事情，我们可以使用 return_state=True 和 return_sequences=True。这是我目前所拥有的

import tensorflow as tf

input = tf.keras.Input((inp_dimension))
hidden_states = [None for _ in range(gru_layers)]

x = tf.keras.layers.GRU(
  gru_neurons, return_sequences=True, return_state=True)(input, initial_state=hidden_states[0])
for _ in range(gru_layers-2):
  x, hidden_states[_+1] = tf.keras.layers.GRU(
    gru_neurons, return_sequences=True)(x, initial_state=hidden_states[_+1])

x, hidden_states[-1] = tf.keras.layers.GRU(
  gru_neurons, return_sequences=True, return_state=True)(x, initial_state=hidden_states[-1])
model = tf.keras.Model(input, [x, hidden_states])

我能够得到一个堆叠的 GRU，但是 return_state=True 只返回单个 GRU 层的隐藏状态，所以我将每个层的所有隐藏状态保存在一个列表中。我还想像在 PyTorch 中一样将隐藏状态输入回模型中。但由于某种原因，这种模型效率极低。与 PyTorch 相比，向前和向后传递几乎两倍的时间。

是否有更有效的方法在 Keras 中实现堆叠 GRU，例如在 PyTorch 中如何实现隐藏状态？我在这里做错了什么？也许我没有正确使用函数式 API？

Tensorflow/Keras 中堆叠 RNN 的隐藏状态

0 个答案: