我想在不使用MultiRNN
包装器的情况下堆叠两个LSTM。但是,由于第二个LSTM中的ValueError: Shapes (3,) and (2,) are not compatible
导致代码结果为inputs=states_fw_1
。如何将第一个LSTM的隐藏状态作为输入传递给第二个?
LSTM 1
with tf.name_scope("BiLSTM_1"):
with tf.variable_scope('forward_1'):
cell_fw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward_srl'):
cell_bw_srl = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs_1, states_1 = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw_1,
cell_bw=cell_bw_1,
dtype=tf.float64,
sequence_length=self.input_seq_len,
inputs=self.embedded_input_layer,
scope='BiLSTM_1')
国家是元组
states_fw_1, states_bw_1 = states
LSTM 2
with tf.name_scope("BiLSTM_2"):
with tf.variable_scope('forward'):
cell_fw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward'):
cell_bw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw,
cell_bw=cell_bw,
dtype=tf.float64,
sequence_length=self.input_seq_len,
inputs=states_fw_1,
scope="BiLSTM_extraction")
答案 0 :(得分:0)
我学习TF 2天(所以我不是亲人)我发现这个问题有兴趣解决。
以下是我的发现: 你想做一些使用LSTMCell'无法获得的东西。实现。原因如下:
您想要将" states_fw_1提供给下一个BI-LSTM。所以,第一个问题应该是:" states_fw_1&#34 ;?的维度是什么?对于任何RNN实现,您需要[batch_size,seq_len,input_size]。对于" states_fw_1"它是[batch_size,hidden_size](我刚检查" states_fw_1&#34的大小;在代码下面运行)。因此,您可以看到您的输出不符合RNN要求。这是因为模型只输出LSTM单元格的最后一个状态,而不是所有历史记录(参见documentation)。并且您对上一个状态不感兴趣,因为您需要将Feed状态[t-step]添加到上面的层。' state_fw_1'当您想要对序列进行分类时(而不是按序列中的每个元素),这非常有用 修改:' state_fw_1'包含最后一个" hidden_state"最后" memory_cell"。仅用于分类" hidden_state"我认为这将是有用的。
所以你只需要使用合并输出(来自前进和后退)。并且' LSTMCell'单元格输出的大小为[batch_size,seq_len,hidden_size * 2],(* 2为正向和反向),因此适合下一个堆叠的RNN(输出来自每个时间步,而不是状态)。
以下是我正在测试的代码:
import tensorflow as tf
import numpy as np
hidden_size = 21
seq_len = tf.placeholder(tf.int32, [None])
inputs = tf.placeholder(tf.float32, [None, None, 32])
with tf.variable_scope('BiLSTM_1'):
with tf.variable_scope('forward_1'):
cell_fw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward_srl'):
cell_bw_1 = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs_1, states_1 = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw_1,
cell_bw=cell_bw_1,
dtype=tf.float32,
sequence_length=seq_len,
inputs=inputs,
scope='BiLSTM_1')
# Merge Output tensor from forward and backward pass. It size is [batch_size, seq_len, 2*hidden_size]
outputs_1 = tf.concat(outputs_1, 2)
with tf.name_scope("BiLSTM_2"):
with tf.variable_scope('forward'):
cell_fw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
with tf.variable_scope('backward'):
cell_bw = tf.nn.rnn_cell.LSTMCell(num_units=hidden_size, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell_fw,
cell_bw=cell_bw,
dtype=tf.float32,
sequence_length=seq_len,
inputs=outputs_1,
scope="BiLSTM_2")
# Initializate the weights and biases
init = tf.initialize_all_variables()
batch_size = 5
seq_len_val = 10
train_inputs = np.zeros((batch_size, seq_len_val, 32))
train_seq_len = np.ones(batch_size) * seq_len_val
with tf.Session() as session:
session.run(init)
feed = {inputs: train_inputs, seq_len: train_seq_len}
out,state,state_1 = session.run([outputs,states, states_1],feed)
print ("State size: ", state_1[0].c.shape, " Out Size: ", out[0][0].shape)
print ("Batch_size: ", batch_size, " Sequence Len: ", seq_len_val, " Hidden Size: ", hidden_size)
答案 1 :(得分:0)
'outputs_1'是一个包含'outputs_fw'和'outputs_bw'的元组。
“ outputs_fw”和“ outputs_bw”的尺寸为:[batch_size,sequence_length,hidden_size]。
您必须将“ outputs_fw”和“ outputs_bw”隐藏状态(我们使用tf.concat的axis = 2)连接起来,并将其作为输入传递给LSTM 2,而不是将“ states_fw_1”作为输入传递给LSTM 2。