我正在使用tensorflow为文本求和实现seq2seq模型。对于编码器,我正在使用双向RNN层。编码层:
def encoding_layer(self, rnn_inputs, rnn_size, num_layers, keep_prob,
source_vocab_size,
encoding_embedding_size,
source_sequence_length,
emb_matrix):
embed = tf.nn.embedding_lookup(emb_matrix, rnn_inputs)
stacked_cells = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.LSTMCell(rnn_size), keep_prob) for _ in range(num_layers)])
outputs, state = tf.nn.bidirectional_dynamic_rnn(cell_fw=stacked_cells,
cell_bw=stacked_cells,
inputs=embed,
sequence_length=source_sequence_length,
dtype=tf.float32)
concat_outputs = tf.concat(outputs, 2)
return concat_outputs, state[0]
对于解码器,我正在使用注意力机制。解码层:
def decoding_layer_train(self, encoder_outputs, encoder_state, dec_cell, dec_embed_input,
target_sequence_length, max_summary_length,
output_layer, keep_prob, rnn_size, batch_size):
"""
Create a training process in decoding layer
:return: BasicDecoderOutput containing training logits and sample_id
"""
dec_cell = tf.contrib.rnn.DropoutWrapper(dec_cell,
output_keep_prob=keep_prob)
train_helper = tf.contrib.seq2seq.TrainingHelper(dec_embed_input, target_sequence_length)
attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(rnn_size, encoder_outputs,
memory_sequence_length=target_sequence_length)
attention_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell, attention_mechanism,
attention_layer_size=rnn_size/2)
state = attention_cell.zero_state(dtype=tf.float32, batch_size=batch_size)
state = state.clone(cell_state=encoder_state)
decoder = tf.contrib.seq2seq.BasicDecoder(cell=attention_cell, helper=train_helper,
initial_state=state,
output_layer=output_layer)
outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, impute_finished=True, maximum_iterations=max_summary_length)
return outputs
现在,BasicDecoder函数的初始状态期望状态为shape =(batch_size,rnn_size)。我的编码器输出shape =(batch_size,rnn_size)的两个状态(向前和向后)。
要使其正常工作,我仅使用编码器的一种状态(转发状态)。因此,我想知道同时使用编码层的向后编码和正向编码的方法。我应该同时添加前进和后退状态吗?
P.S。 -解码器不使用双向层。
答案 0 :(得分:0)
如果您只想使用向后编码:
# Get only the last cell state of the backward cell
(_, _), (_, cell_state_bw) = tf.nn.bidirectional_dynamic_rnn(...)
# Pass the cell_state_bw as the initial state of the decoder cell
decoder = tf.contrib.seq2seq.BasicDecoder(..., initial_state=cell_state_bw, ...)
我建议您做什么:
# Get both last states
(_, _), (cell_state_fw, cell_state_bw) = tf.nn.bidirectional_dynamic_rnn(...)
# Concatenate the cell states together
cell_state_final = tf.concat([cell_state_fw.c, cell_state_bw.c], 1)
# Concatenate the hidden states together
hidden_state_final = tf.concat([cell_state_fw.h, cell_state_bw.h], 1)
# Create the actual final state
encoder_final_state = tf.nn.rnn_cell.LSTMStateTuple(c=cell_state_final, h=hidden_state_final)
# Now you can pass this as the initial state of the decoder
但是请注意,解码器单元的大小必须是编码器单元的大小的两倍,第二种方法才能起作用。
答案 1 :(得分:0)
以前的回复中已经涵盖了大部分内容。
关于您的关注,“我应该同时添加正向和反向状态吗?”,据我所述,我们应该同时使用编码器的状态。否则,我们不会利用训练后的编码器状态。 此外,“ bidirectional_dynamic_rnn”应具有两个不同的LSTM单元层: 一个用于FW状态,另一个用于BW状态。