我正在尝试根据https://github.com/tensorflow/nmt/上的教程编写自己的seq2seq NMT。这是我对解码器的实现,没有任何注意机制:
decoder_cells = tf.contrib.rnn.MultiRNNCell(
[tf.contrib.rnn.LSTMCell(num_units) for _ in range(num_layers)])
with tf.variable_scope("training_decoder"):
# Helper
projection_layer = tf.layers.Dense(vi_vocab_size)
helper = tf.contrib.seq2seq.TrainingHelper(
decoder_embed_input, sequence_length = decoder_input_lengths)
# Decoder
decoder = tf.contrib.seq2seq.BasicDecoder(
decoder_cells, helper, encoder_state,
output_layer=projection_layer)
# Dynamic decoding
training_outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder,
impute_finished=True,
maximum_iterations=None)
training_logits = training_outputs
with tf.variable_scope("inference_decoder", reuse=True):
# Helper
start_tokens = tf.map_fn(lambda x: vi_sos_id, decoder_input_lengths)
helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embedding_decoder,
start_tokens,
vi_eos_id)
# Decoder
decoder = tf.contrib.seq2seq.BasicDecoder(decoder_cells,
helper,
encoder_state,
output_layer=projection_layer)
#Dynamic decoding
prediction_outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder,
impute_finished=True,
maximum_iterations=maximum_iterations)
训练模型时,经过2000批处理(batch_size为50),我在开发集上获得了约0.005 BLEU分数。但是,当我实现这样的注意力机制时:
decoder_cells = tf.contrib.rnn.MultiRNNCell(
[tf.contrib.rnn.LSTMCell(num_units) for _ in range(num_layers)])
with tf.variable_scope("training_decoder"):
# Helper
projection_layer = tf.layers.Dense(vi_vocab_size)
helper = tf.contrib.seq2seq.TrainingHelper(
decoder_embed_input, sequence_length = decoder_input_lengths)
# Create an attention mechanism
attention_mechanism = tf.contrib.seq2seq.LuongAttention(
num_units, encoder_outputs)
# Attention wrapper
attention_decoder_cells = tf.contrib.seq2seq.AttentionWrapper(
cell=decoder_cells,
attention_mechanism=attention_mechanism,
attention_layer_size=num_units)
# Attention initial state
attention_initial_state = attention_decoder_cells.zero_state(dtype=tf.float32, batch_size=num_sentences).clone(
cell_state=encoder_state)
# Attention Decoder
decoder = tf.contrib.seq2seq.BasicDecoder(
attention_decoder_cells, helper,
initial_state=attention_initial_state,
output_layer=projection_layer)
# Dynamic decoding
training_outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder,
impute_finished=True,
maximum_iterations=None)
training_logits = training_outputs
with tf.variable_scope("inference_decoder", reuse=tf.AUTO_REUSE):
# Helper
start_tokens = tf.map_fn(lambda x: vi_sos_id, decoder_input_lengths)
helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embedding_decoder,
start_tokens,
vi_eos_id)
# Create an attention mechanism
attention_mechanism = tf.contrib.seq2seq.LuongAttention(
num_units, encoder_outputs)
# Attention wrapper
attention_decoder_cells = tf.contrib.seq2seq.AttentionWrapper(
cell=decoder_cells,
attention_mechanism=attention_mechanism,
attention_layer_size=num_units)
# Attention initial state
attention_initial_state = attention_decoder_cells.zero_state(dtype=tf.float32, batch_size=num_sentences).clone(
cell_state=encoder_state)
# Decoder
decoder = tf.contrib.seq2seq.BasicDecoder(attention_decoder_cells,
helper,
attention_initial_state,
output_layer=projection_layer)
#Dynamic decoding
prediction_outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder,
impute_finished=True,
maximum_iterations=maximum_iterations)
即使经过6000批处理,我在开发集上的BLEU分数也始终为0。谁能告诉我问题出在哪里?
编辑:在为AttentionWrapper设置output_attention=False
之后,经过6000批处理,我在开发集上获得了0.001 BLEU分数。但这似乎不是正确的解决方案,因为Tensorflow文档说output_attention=True
适用于Luong风格的注意机制,而我在代码中使用LuongAttention。