我在OpenSubtitles对话框上训练Seq2Seq模型 - Cornell-Movie-Dialogs-Corpus。
我的工作基于以下论文:
训练融合得很好,但当我检查网络实际预测的内容时 - 我发现encoder
的输入对于decoder
的预测根本不重要。重要的只是提交给decoder
的单词。
例如,在标记<s>
之后,decoder
始终会预测单词“i”。
我的架构的伪代码看起来像这样(我使用的是Tensorflow 1.5):
seq1 = tf.placeholder(...)
seq2 = tf.placeholder(...)
embeddings = tf.Variable(tf.zeros([vocab_size, 300]))
seq1_emb = tf.nn.embedding_lookup(embeddings, seq1)
seq2_emb = tf.nn.embedding_lookup(embeddings, seq2)
# Encoder init_state is a random_uniform of: -0.08, 0.08 (according to paper [1])
encoder_out, state1 = tf.nn.dynamic_rnn(BasicLSTMCell(), seq1_emb)
decoder_out, state2 = tf.nn.dynamic_rnn(BasicLSTMCell(), seq2_emb,
initial_state=state_1)
logit = Dense(decoder_out, use_bias=False)
crossent = tf.nn.saparse_softmax_cross_entropy_with_logits(logits=logit,
labels=target)
crossent = mask_padded_zeros(crossent)
loss = tf.reduce_sum(crossent) / number_of_words_in_batch
# Gradient is Clipped-By-Norm with 5 (according to paper [1])
train = tf.train.GradientDescentOptimizer(learning_rate=0.7).minimize(loss)
我很高兴你的帮助!