Question

我试图在由双向LSTM层组成的编码器中将初始状态设置为0。但是，如果我输入单个0矩阵，则会收到一条错误消息，指出必须使用张量列表初始化双向层（有意义）。当我尝试将此0的矩阵复制到包含两个矩阵的列表中（以初始化RNNs）时，我收到一个输入形状错误的错误消息。我在这里想念什么？

class Encoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
    super(Encoder, self).__init__()
    self.batch_sz = batch_sz
    self.enc_units = enc_units
    self.embedding = keras.layers.Embedding(vocab_size, embedding_dim)
    self.lstmb = keras.layers.Bidirectional(lstm(self.enc_units, dropout=0.1))

def call(self, x, hidden):
    x = self.embedding(x)
    output, forward_h, forward_c, backward_h, backward_c = self.lstmb(x, initial_state=[hidden, hidden])
    return output, forward_h, forward_c, backward_h, backward_c


def initialize_hidden_state(batch_sz, enc_units):
    return tf.zeros((batch_sz, enc_units))

我得到的错误是：

ValueError: An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=[InputSpec(shape=(128, 512), ndim=2)]; however `cell.state_size` is [512, 512]

注意：函数initialize_hidden_state的输出被馈送到调用函数的参数hidden。

Answer 1

阅读所有评论和答案，我想我设法创建了一个有效的示例。

但首先要注意：

我认为，如果在LSTM的构造函数中指定了对self.lstmb的调用，则只会返回所有五个状态。
我认为您不需要将隐藏状态作为隐藏状态列表进行传递。您应该将其作为初始状态传递。

class Encoder(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
        super(Encoder, self).__init__()
        self.batch_sz = batch_sz
        self.enc_units = enc_units
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
        # tell LSTM you want to get the states, and sequences returned
        self.lstmb = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(self.enc_units,
                                                                        return_sequences=True,
                                                                        return_state=True,
                                                                        dropout=0.1))

    def call(self, x, hidden):
        x = self.embedding(x)
        # no need to pass [hidden, hidden], just pass it as is
        output, forward_h, forward_c, backward_h, backward_c = self.lstmb(x, initial_state=hidden)
        return output, forward_h, forward_c, backward_h, backward_c


    def initialize_hidden_state(self):
        # I stole this idea from iamlcc, so the credit is not mine.
        return [tf.zeros((self.batch_sz, self.enc_units)) for i in range(4)]


encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE)

# sample input
sample_hidden = encoder.initialize_hidden_state()
sample_output, forward_h, forward_c, backward_h, backward_c = encoder(example_input_batch, sample_hidden)
print('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
print('Encoder forward_h shape: (batch size, units) {}'.format(forward_h.shape))
print('Encoder forward_c shape: (batch size, units) {}'.format(forward_c.shape))
print('Encoder backward_h shape: (batch size, units) {}'.format(backward_h.shape))
print('Encoder backward_c shape: (batch size, units) {}'.format(backward_c.shape))

Answer 2

您要输入的状态大小为(batch_size, hidden_units)，并且您应输入状态的大小为(hidden_units, hidden_units)。而且它必须具有4个初始状态：2个用于2个lstm状态，另外2个因为由于双向而具有一个正向和一个反向传递。

尝试更改此内容：

def initialize_hidden_state(batch_sz, enc_units):
    return tf.zeros((batch_sz, enc_units))

收件人

def initialize_hidden_state(enc_units, enc_units):
    init_state = [np.zeros((enc_units, enc_units)) for i in range(4)]
    return init_state

希望这会有所帮助

Answer 3

我最终没有使用双向包装器，只是创建了2个LSTM层，其中一层接收了参数go_backwards=True并连接了输出（如果有帮助的话）。我认为双向Keras包装器目前无法处理此类问题。

Answer 4

如果还不算太晚，我认为您的initialize_hidden_state函数应该是：

def initialize_hidden_state(self): init_state = [tf.zeros((self.batch_sz, self.enc_units)) for i in range(4)] return init_state

Answer 5

我使用tf.keras.Model构造了编码器，并遇到了相同的错误。 PR可能会对您有所帮助。最终，我通过tf.keras.layers.layer建立了模型，而我仍在研究中。成功后我会更新！

Answer 6

@BCJuan的答案正确，但是我必须进行一些更改才能使其正常工作：

def initialize_hidden_state(batch_sz, enc_units):
    init_state = [tf.zeros((batch_sz, enc_units)) for i in range(2)]
    return init_state

非常重要：请使用tf.zeros而不是np.zeros，因为它期望使用tf.tensor类型。

如果在双向包装器中使用单个LSTM层，则需要返回2个 tf.tensors 的列表来初始化每个RNN。一种用于前进，另一种用于后退。

此外，如果您查看an example in TF's documentation，它们将使用batch_sz和enc_units指定隐藏状态的大小。

如何在Keras中为双向LSTM层设置初始状态？

6 个答案: