Seq2Seq:解码器预测不依赖于编码器输入

时间:2018-03-08 01:11:06

标签: python tensorflow machine-learning neural-network lstm

我在OpenSubtitles对话框上训练Seq2Seq模型 - Cornell-Movie-Dialogs-Corpus

我的工作基于以下论文:

训练融合得很好,但当我检查网络实际预测的内容时 - 我发现encoder的输入对于decoder的预测根本不重要。重要的只是提交给decoder的单词。

例如,在标记<s>之后,decoder始终会预测单词“i”。

我的架构的伪代码看起来像这样(我使用的是Tensorflow 1.5):

seq1 = tf.placeholder(...)
seq2 = tf.placeholder(...)

embeddings = tf.Variable(tf.zeros([vocab_size, 300]))

seq1_emb = tf.nn.embedding_lookup(embeddings, seq1)
seq2_emb = tf.nn.embedding_lookup(embeddings, seq2)

# Encoder init_state is a random_uniform of: -0.08, 0.08 (according to paper [1])

encoder_out, state1 = tf.nn.dynamic_rnn(BasicLSTMCell(), seq1_emb)
decoder_out, state2 = tf.nn.dynamic_rnn(BasicLSTMCell(), seq2_emb,
                                                        initial_state=state_1)
logit = Dense(decoder_out, use_bias=False)

crossent = tf.nn.saparse_softmax_cross_entropy_with_logits(logits=logit, 
                                                         labels=target)
crossent = mask_padded_zeros(crossent)
loss = tf.reduce_sum(crossent) / number_of_words_in_batch

# Gradient is Clipped-By-Norm with 5 (according to paper [1])

train = tf.train.GradientDescentOptimizer(learning_rate=0.7).minimize(loss)

我很高兴你的帮助!

0 个答案:

没有答案