Tensorflow:堆叠双向LSTM

时间:2016-10-17 21:39:59

标签: tensorflow

我想在不使用MultiRNN包装器的情况下堆叠两个LSTM。但是,由于第二个LSTM中的ValueError: Shapes (3,) and (2,) are not compatible导致代码结果为inputs=states_fw_1。如何将第一个LSTM的隐藏状态作为输入传递给第二个?

LSTM 1

    with tf.name_scope("BiLSTM_1"):
        with tf.variable_scope('forward_1'):
            cell_fw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)

        with tf.variable_scope('backward_srl'):
            cell_bw_srl = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)

        outputs_1, states_1 = tf.nn.bidirectional_dynamic_rnn(
            cell_fw=cell_fw_1,
            cell_bw=cell_bw_1,
            dtype=tf.float64,
            sequence_length=self.input_seq_len,
            inputs=self.embedded_input_layer,
            scope='BiLSTM_1')

国家是元组

    states_fw_1, states_bw_1 = states

LSTM 2

    with tf.name_scope("BiLSTM_2"):
        with tf.variable_scope('forward'):
            cell_fw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)

        with tf.variable_scope('backward'):
            cell_bw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)

        outputs, states = tf.nn.bidirectional_dynamic_rnn(
            cell_fw=cell_fw,
            cell_bw=cell_bw,
            dtype=tf.float64,
            sequence_length=self.input_seq_len,
            inputs=states_fw_1,
            scope="BiLSTM_extraction")

2 个答案:

答案 0 :(得分:0)

我学习TF 2天(所以我不是亲人)我发现这个问题有兴趣解决。

以下是我的发现: 你想做一些使用LSTMCell'无法获得的东西。实现。原因如下:

  1. 您想要将" states_fw_1提供给下一个BI-LSTM。所以,第一个问题应该是:" states_fw_1&#34 ;?的维度是什么?对于任何RNN实现,您需要[batch_size,seq_len,input_size]。对于" states_fw_1"它是[batch_size,hidden_​​size](我刚检查" states_fw_1&#34的大小;在代码下面运行)。因此,您可以看到您的输出不符合RNN要求。这是因为模型只输出LSTM单元格的最后一个状态,而不是所有历史记录(参见documentation)。并且您对上一个状态不感兴趣,因为您需要将Feed状态[t-step]添加到上面的层。' state_fw_1'当您想要对序列进行分类时(而不是按序列中的每个元素),这非常有用 修改:' state_fw_1'包含最后一个" hidden_​​state"最后" memory_cell"。仅用于分类" hidden_​​state"我认为这将是有用的。

  2. 所以你只需要使用合并输出(来自前进和后退)。并且' LSTMCell'单元格输出的大小为[batch_size,seq_len,hidden_​​size * 2],(* 2为正向和反向),因此适合下一个堆叠的RNN(输出来自每个时间步,而不是状态)。

  3. 以下是我正在测试的代码:

    import tensorflow as tf
    import numpy as np
    
    hidden_size = 21
    seq_len = tf.placeholder(tf.int32, [None])
    inputs = tf.placeholder(tf.float32, [None, None, 32])
    with tf.variable_scope('BiLSTM_1'):
      with tf.variable_scope('forward_1'):
        cell_fw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
      with tf.variable_scope('backward_srl'):
          cell_bw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
      outputs_1, states_1 = tf.nn.bidirectional_dynamic_rnn(
          cell_fw=cell_fw_1,
          cell_bw=cell_bw_1,
          dtype=tf.float32,
          sequence_length=seq_len,
          inputs=inputs,
          scope='BiLSTM_1')
    
    # Merge Output tensor from forward and backward pass. It size is [batch_size, seq_len, 2*hidden_size]
    outputs_1 = tf.concat(outputs_1, 2)
    
    with tf.name_scope("BiLSTM_2"):
      with tf.variable_scope('forward'):
          cell_fw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
      with tf.variable_scope('backward'):
          cell_bw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
      outputs, states = tf.nn.bidirectional_dynamic_rnn(
          cell_fw=cell_fw,
          cell_bw=cell_bw,
          dtype=tf.float32,
          sequence_length=seq_len,
          inputs=outputs_1,
          scope="BiLSTM_2")
    
    # Initializate the weights and biases
    init = tf.initialize_all_variables()
    batch_size = 5
    seq_len_val = 10
    train_inputs = np.zeros((batch_size, seq_len_val, 32))
    train_seq_len = np.ones(batch_size) * seq_len_val
    with tf.Session() as session:
      session.run(init)
      feed = {inputs: train_inputs, seq_len: train_seq_len}
      out,state,state_1 = session.run([outputs,states, states_1],feed)
    
    print ("State size: ", state_1[0].c.shape, " Out Size: ", out[0][0].shape)
    print ("Batch_size: ", batch_size, " Sequence Len: ", seq_len_val, " Hidden Size: ", hidden_size)
    

答案 1 :(得分:0)

LSTM 1返回的

'outputs_1'是一个包含'outputs_fw'和'outputs_bw'的元组。

“ outputs_fw”和“ outputs_bw”的尺寸为:[batch_size,sequence_length,hidden_​​size]。

您必须将“ outputs_fw”和“ outputs_bw”隐藏状态(我们使用tf.concat的axis = 2)连接起来,并将其作为输入传递给LSTM 2,而不是将“ states_fw_1”作为输入传递给LSTM 2。