我正在为神经机器翻译实现一个基本的seq2seq模型。我对编码器和解码器都有词嵌入功能。我的编码器使用TrainingHelper函数,而我的解码器使用GreedyEmbeddingHelper。我尝试使用两种与我相似的不同前置程序来训练我的网络,但是其中一种无效,我也不十分清楚为什么。
1)对句子进行预处理:为所有句子添加“ EOS”。
target_ids = []
for sentence in text:
temp = []
# add <GO> indicator of the target sentence
#temp.append(self.word2idx['<GO>'])
# convert word of each sentence in ids
temp = temp + self.ConvertSentenceToIndex(sentence)
# add <EOS> indicator of the target sentence
temp.append(self.word2idx['<EOS>'])
target_ids.append(temp)
然后,在训练过程中,我们将“ GO”添加到目标输入数据,然后再进行如下所示的解码嵌入:
def build_decoder(target_data, encoder_state,
target_sequence_length, max_target_sequence_length,
rnn_size, num_layers, target_vocab_to_int, target_vocab_size,
batch_size, keep_prob, decoding_embedding_size):
"""
Create decoding layer
:input:
@target_data : target data
@encoder_state : feature map of the encoder
@target_sequence_length : tf.placeholder
@max_target_sequence_length : max value of target_sequence_length
@rnn_size : number of neurons for each rnn in the decoder
@num_layers : number of layers of RNN
@target_vocab_to_int : dic of the vocabulary to int
@target_vocab_size : size of the vocabulary target
@batch_size : size of the batch
@keep_prob : probability of the dropout
@decoding_embedding_size : size of the embedding target
:return: Tuple of (Training BasicDecoderOutput, Inference BasicDecoderOutput)
"""
# --- add <GO> to target data --- #
go_id = target_vocab_to_int['<GO>']
after_slice = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1])
after_concat = tf.concat( [tf.fill([batch_size, 1], go_id), after_slice], 1)
# --- Embedding Decoder --- #
target_vocab_size = len(target_vocab_to_int)
dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, after_concat)
cells = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.LSTMCell(rnn_size) for _ in range(num_layers)])
output_layer = tf.layers.Dense(target_vocab_size)
with tf.variable_scope("decode"):
# --- Training decoding --- #
train_cell = tf.contrib.rnn.DropoutWrapper(cells,
output_keep_prob=keep_prob)
# for only input layer
train_helper = tf.contrib.seq2seq.TrainingHelper(dec_embed_input,
target_sequence_length)
train_decoder = tf.contrib.seq2seq.BasicDecoder(train_cell,
train_helper,
encoder_state,
output_layer)
# unrolling the decoder layer
train_output, _, _ = tf.contrib.seq2seq.dynamic_decode(train_decoder,
impute_finished=True,
maximum_iterations=max_target_sequence_length)
# --- Inference decoding --- #
infer_cell = tf.contrib.rnn.DropoutWrapper(cells,
output_keep_prob=keep_prob)
start_of_sequence_id = target_vocab_to_int['<GO>']
end_of_sequence_id = target_vocab_to_int['<EOS>']
infer_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings,
#start_of_sequence_id,
tf.fill([batch_size], start_of_sequence_id),
end_of_sequence_id)
infer_decoder = tf.contrib.seq2seq.BasicDecoder(infer_cell,
infer_helper,
encoder_state,
output_layer)
infer_output, _, _ = tf.contrib.seq2seq.dynamic_decode(infer_decoder,
impute_finished=True,
maximum_iterations=max_target_sequence_length)
return (train_output, infer_output)
那行得通,推理过程中的准确性很好。
2)对句子进行预处理:为所有句子添加“ GO”和“ EOS”。因此,我在buil_decoder函数中删除了前三行并进行了更改:
dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, after_concat)
作者:
dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, target_data)
但这在培训期间不起作用。损失减少(因此应该了解),但准确性也降低,这是异常的。此外,当我在推理过程中查看预测时,模型会预测大多数时间重复的单词。我不明白为什么在“ build_decoder”函数中添加“ GO”会正常工作,而在外部无法正常工作。
有人对此有想法吗?这是与trainingHelper或其他相关的问题吗?
这是我在jupyter笔记本上的全部代码:https://github.com/Shiro-LK/TF_exemple/blob/master/NTM_Seq2Seq.ipynb