Tensorflow Seq2Seq启动解码器序列无法正常运行

时间:2019-02-23 22:27:32

标签: tensorflow neural-network machine-translation seq2seq

我正在为神经机器翻译实现一个基本的seq2seq模型。我对编码器和解码器都有词嵌入功能。我的编码器使用TrainingHelper函数,而我的解码器使用GreedyEmbeddingHelper。我尝试使用两种与我相似的不同前置程序来训练我的网络,但是其中一种无效,我也不十分清楚为什么。

1)对句子进行预处理:为所有句子添加“ EOS”。

target_ids = []
for sentence in text:
        temp = []
        # add <GO> indicator of the target sentence
        #temp.append(self.word2idx['<GO>'])
        # convert word of each sentence in ids
        temp = temp + self.ConvertSentenceToIndex(sentence)
        # add <EOS> indicator of the target sentence
        temp.append(self.word2idx['<EOS>'])

        target_ids.append(temp)

然后,在训练过程中,我们将“ GO”添加到目标输入数据,然后再进行如下所示的解码嵌入:

def build_decoder(target_data, encoder_state,
                   target_sequence_length, max_target_sequence_length,
                   rnn_size, num_layers, target_vocab_to_int, target_vocab_size,
                   batch_size, keep_prob, decoding_embedding_size):
    """
    Create decoding layer
    :input:
    @target_data : target data 
    @encoder_state : feature map of the encoder
    @target_sequence_length : tf.placeholder
    @max_target_sequence_length : max value of target_sequence_length
    @rnn_size : number of neurons for each rnn in the decoder
    @num_layers : number of layers of RNN 
    @target_vocab_to_int : dic of the vocabulary to int
    @target_vocab_size : size of the vocabulary target
    @batch_size : size of the batch
    @keep_prob : probability of the dropout
    @decoding_embedding_size : size of the embedding target 

    :return: Tuple of (Training BasicDecoderOutput, Inference BasicDecoderOutput)
    """              
    # --- add <GO> to target data --- #
    go_id = target_vocab_to_int['<GO>']

    after_slice = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1])
    after_concat = tf.concat( [tf.fill([batch_size, 1], go_id), after_slice], 1)

    # --- Embedding Decoder --- #
    target_vocab_size = len(target_vocab_to_int)
    dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
    dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, after_concat)

    cells = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.LSTMCell(rnn_size) for _ in range(num_layers)])
    output_layer = tf.layers.Dense(target_vocab_size)

    with tf.variable_scope("decode"):
        # --- Training decoding --- #
        train_cell = tf.contrib.rnn.DropoutWrapper(cells, 
                                             output_keep_prob=keep_prob)

        # for only input layer
        train_helper = tf.contrib.seq2seq.TrainingHelper(dec_embed_input, 
                                                         target_sequence_length)

        train_decoder = tf.contrib.seq2seq.BasicDecoder(train_cell, 
                                                  train_helper, 
                                                  encoder_state, 
                                                  output_layer)

        # unrolling the decoder layer
        train_output, _, _ = tf.contrib.seq2seq.dynamic_decode(train_decoder, 
                                                          impute_finished=True, 
                                                          maximum_iterations=max_target_sequence_length)



        # --- Inference decoding --- #
        infer_cell = tf.contrib.rnn.DropoutWrapper(cells, 
                                             output_keep_prob=keep_prob)

        start_of_sequence_id = target_vocab_to_int['<GO>']
        end_of_sequence_id = target_vocab_to_int['<EOS>']

        infer_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings, 
                                                          #start_of_sequence_id,
                                                          tf.fill([batch_size], start_of_sequence_id), 
                                                          end_of_sequence_id)

        infer_decoder = tf.contrib.seq2seq.BasicDecoder(infer_cell, 
                                                  infer_helper, 
                                                  encoder_state, 
                                                  output_layer)

        infer_output, _, _ = tf.contrib.seq2seq.dynamic_decode(infer_decoder, 
                                                          impute_finished=True, 
                                                          maximum_iterations=max_target_sequence_length)

    return (train_output, infer_output)

那行得通,推理过程中的准确性很好。

2)对句子进行预处理:为所有句子添加“ GO”和“ EOS”。因此,我在buil_decoder函数中删除了前三行并进行了更改:

dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, after_concat)

作者:

dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, target_data)

但这在培训期间不起作用。损失减少(因此应该了解),但准确性也降低,这是异常的。此外,当我在推理过程中查看预测时,模型会预测大多数时间重复的单词。我不明白为什么在“ build_decoder”函数中添加“ GO”会正常工作,而在外部无法正常工作。

有人对此有想法吗?这是与trainingHelper或其他相关的问题吗?

这是我在jupyter笔记本上的全部代码:https://github.com/Shiro-LK/TF_exemple/blob/master/NTM_Seq2Seq.ipynb

0 个答案:

没有答案