Question

我想在不使用MultiRNN包装器的情况下堆叠两个LSTM。但是，由于第二个LSTM中的ValueError: Shapes (3,) and (2,) are not compatible导致代码结果为inputs=states_fw_1。如何将第一个LSTM的隐藏状态作为输入传递给第二个？

LSTM 1

    with tf.name_scope("BiLSTM_1"):
        with tf.variable_scope('forward_1'):
            cell_fw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)

        with tf.variable_scope('backward_srl'):
            cell_bw_srl = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)

        outputs_1, states_1 = tf.nn.bidirectional_dynamic_rnn(
            cell_fw=cell_fw_1,
            cell_bw=cell_bw_1,
            dtype=tf.float64,
            sequence_length=self.input_seq_len,
            inputs=self.embedded_input_layer,
            scope='BiLSTM_1')

国家是元组

    states_fw_1, states_bw_1 = states

LSTM 2

    with tf.name_scope("BiLSTM_2"):
        with tf.variable_scope('forward'):
            cell_fw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)

        with tf.variable_scope('backward'):
            cell_bw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)

        outputs, states = tf.nn.bidirectional_dynamic_rnn(
            cell_fw=cell_fw,
            cell_bw=cell_bw,
            dtype=tf.float64,
            sequence_length=self.input_seq_len,
            inputs=states_fw_1,
            scope="BiLSTM_extraction")

Answer 1

我学习TF 2天（所以我不是亲人）我发现这个问题有兴趣解决。

以下是我的发现：你想做一些使用LSTMCell＆＃39;无法获得的东西。实现。原因如下：

您想要将＆＃34; states_fw_1提供给下一个BI-LSTM。所以，第一个问题应该是：＆＃34; states_fw_1＆＃34 ;?的维度是什么？对于任何RNN实现，您需要[batch_size，seq_len，input_size]。对于＆＃34; states_fw_1＆＃34;它是[batch_size，hidden_size]（我刚检查＆＃34; states_fw_1＆＃34的大小;在代码下面运行）。因此，您可以看到您的输出不符合RNN要求。这是因为模型只输出LSTM单元格的最后一个状态，而不是所有历史记录（参见documentation）。并且您对上一个状态不感兴趣，因为您需要将Feed状态[t-step]添加到上面的层。＆＃39; state_fw_1＆＃39;当您想要对序列进行分类时（而不是按序列中的每个元素），这非常有用修改：＆＃39; state_fw_1＆＃39;包含最后一个＆＃34; hidden_state＆＃34;最后＆＃34; memory_cell＆＃34;。仅用于分类＆＃34; hidden_state＆＃34;我认为这将是有用的。
所以你只需要使用合并输出（来自前进和后退）。并且＆＃39; LSTMCell＆＃39;单元格输出的大小为[batch_size，seq_len，hidden_size * 2]，（* 2为正向和反向），因此适合下一个堆叠的RNN（输出来自每个时间步，而不是状态）。

以下是我正在测试的代码：

import tensorflow as tf
import numpy as np

hidden_size = 21
seq_len = tf.placeholder(tf.int32, [None])
inputs = tf.placeholder(tf.float32, [None, None, 32])
with tf.variable_scope('BiLSTM_1'):
  with tf.variable_scope('forward_1'):
    cell_fw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
  with tf.variable_scope('backward_srl'):
      cell_bw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
  outputs_1, states_1 = tf.nn.bidirectional_dynamic_rnn(
      cell_fw=cell_fw_1,
      cell_bw=cell_bw_1,
      dtype=tf.float32,
      sequence_length=seq_len,
      inputs=inputs,
      scope='BiLSTM_1')

# Merge Output tensor from forward and backward pass. It size is [batch_size, seq_len, 2*hidden_size]
outputs_1 = tf.concat(outputs_1, 2)

with tf.name_scope("BiLSTM_2"):
  with tf.variable_scope('forward'):
      cell_fw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
  with tf.variable_scope('backward'):
      cell_bw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
  outputs, states = tf.nn.bidirectional_dynamic_rnn(
      cell_fw=cell_fw,
      cell_bw=cell_bw,
      dtype=tf.float32,
      sequence_length=seq_len,
      inputs=outputs_1,
      scope="BiLSTM_2")

# Initializate the weights and biases
init = tf.initialize_all_variables()
batch_size = 5
seq_len_val = 10
train_inputs = np.zeros((batch_size, seq_len_val, 32))
train_seq_len = np.ones(batch_size) * seq_len_val
with tf.Session() as session:
  session.run(init)
  feed = {inputs: train_inputs, seq_len: train_seq_len}
  out,state,state_1 = session.run([outputs,states, states_1],feed)

print ("State size: ", state_1[0].c.shape, " Out Size: ", out[0][0].shape)
print ("Batch_size: ", batch_size, " Sequence Len: ", seq_len_val, " Hidden Size: ", hidden_size)

Answer 2

LSTM 1返回的

'outputs_1'是一个包含'outputs_fw'和'outputs_bw'的元组。

“ outputs_fw”和“ outputs_bw”的尺寸为：[batch_size，sequence_length，hidden_size]。

您必须将“ outputs_fw”和“ outputs_bw”隐藏状态（我们使用tf.concat的axis = 2）连接起来，并将其作为输入传递给LSTM 2，而不是将“ states_fw_1”作为输入传递给LSTM 2。

Tensorflow：堆叠双向LSTM

2 个答案: