Question

我正在尝试根据https://github.com/tensorflow/nmt/上的教程编写自己的seq2seq NMT。这是我对解码器的实现，没有任何注意机制：

decoder_cells = tf.contrib.rnn.MultiRNNCell(
            [tf.contrib.rnn.LSTMCell(num_units) for _ in range(num_layers)])

with tf.variable_scope("training_decoder"):
    # Helper
    projection_layer = tf.layers.Dense(vi_vocab_size)
    helper = tf.contrib.seq2seq.TrainingHelper(
        decoder_embed_input, sequence_length = decoder_input_lengths)

    # Decoder
    decoder = tf.contrib.seq2seq.BasicDecoder(
        decoder_cells, helper, encoder_state,
        output_layer=projection_layer)


    # Dynamic decoding
    training_outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder,
                                                  impute_finished=True,
                                                  maximum_iterations=None)
    training_logits = training_outputs

with tf.variable_scope("inference_decoder", reuse=True):
    # Helper
    start_tokens = tf.map_fn(lambda x: vi_sos_id, decoder_input_lengths)

    helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embedding_decoder,
                                                      start_tokens,
                                                      vi_eos_id)

    # Decoder
    decoder = tf.contrib.seq2seq.BasicDecoder(decoder_cells,
                                              helper,
                                              encoder_state,
                                              output_layer=projection_layer)

    #Dynamic decoding
    prediction_outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder,
                                                      impute_finished=True,
                                                      maximum_iterations=maximum_iterations)

训练模型时，经过2000批处理（batch_size为50），我在开发集上获得了约0.005 BLEU分数。但是，当我实现这样的注意力机制时：

decoder_cells = tf.contrib.rnn.MultiRNNCell(
            [tf.contrib.rnn.LSTMCell(num_units) for _ in range(num_layers)])

with tf.variable_scope("training_decoder"):
    # Helper
    projection_layer = tf.layers.Dense(vi_vocab_size)
    helper = tf.contrib.seq2seq.TrainingHelper(
        decoder_embed_input, sequence_length = decoder_input_lengths)

    # Create an attention mechanism
    attention_mechanism = tf.contrib.seq2seq.LuongAttention(
        num_units, encoder_outputs)     

    # Attention wrapper
    attention_decoder_cells = tf.contrib.seq2seq.AttentionWrapper(
        cell=decoder_cells,
        attention_mechanism=attention_mechanism,
        attention_layer_size=num_units)

    # Attention initial state
    attention_initial_state = attention_decoder_cells.zero_state(dtype=tf.float32, batch_size=num_sentences).clone(
        cell_state=encoder_state)

    # Attention Decoder
    decoder = tf.contrib.seq2seq.BasicDecoder(
        attention_decoder_cells, helper,
        initial_state=attention_initial_state,
        output_layer=projection_layer)   

    # Dynamic decoding
    training_outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder,
                                                  impute_finished=True,
                                                  maximum_iterations=None)
    training_logits = training_outputs

with tf.variable_scope("inference_decoder", reuse=tf.AUTO_REUSE):
    # Helper
    start_tokens = tf.map_fn(lambda x: vi_sos_id, decoder_input_lengths)

    helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embedding_decoder,
                                                      start_tokens,
                                                      vi_eos_id)

    # Create an attention mechanism
    attention_mechanism = tf.contrib.seq2seq.LuongAttention(
        num_units, encoder_outputs)

    # Attention wrapper         
    attention_decoder_cells = tf.contrib.seq2seq.AttentionWrapper(
        cell=decoder_cells,
        attention_mechanism=attention_mechanism,
        attention_layer_size=num_units)

    # Attention initial state           
    attention_initial_state = attention_decoder_cells.zero_state(dtype=tf.float32, batch_size=num_sentences).clone(
        cell_state=encoder_state)           

    # Decoder
    decoder = tf.contrib.seq2seq.BasicDecoder(attention_decoder_cells,
                                              helper,
                                              attention_initial_state,
                                              output_layer=projection_layer)                                                  

    #Dynamic decoding
    prediction_outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder,
                                                      impute_finished=True,
                                                      maximum_iterations=maximum_iterations)

即使经过6000批处理，我在开发集上的BLEU分数也始终为0。谁能告诉我问题出在哪里？

编辑：在为AttentionWrapper设置output_attention=False之后，经过6000批处理，我在开发集上获得了0.001 BLEU分数。但这似乎不是正确的解决方案，因为Tensorflow文档说output_attention=True适用于Luong风格的注意机制，而我在代码中使用LuongAttention。

实施注意力机制后，Seq2Seq NMT的BLEU分数始终为零

0 个答案: