Question

我在自定义seq2seq（编码器-解码器）体系结构中实现Bahdanau注意机制时遇到问题。到目前为止，我到了这一点：

# all variables are of shape (unrollings, batch_size, hidden_size)
merged_states = []
for i in range(len(encoder_hidden_states)):
    merged_states.append(tf.concat([encoder_hidden_states[i], prev_decoder_hidden_state], axis=1))

eij = tf.layers.dense(tf.convert_to_tensor(merged_states), encoder_hidden_states[0].shape[1], activation='tanh')
softmax = tf.nn.softmax(eij, dim=0)
context = tf.reduce_sum(tf.multiply(softmax, encoder_hidden_states), axis=0)

在本文中提到得分函数是一个1层网络，但是我看到其他人使用： this formula。从我的轻描淡写来看，上面的代码应该实现方程式的最后一部分，而不用乘以v。在这里我有几个问题：

向量v的大小是多少，为什么要这样做？

在我计算了新的上下文之后，表明应该将其馈送到解码器吗？根据我所看到的，人们正在将当前输入（先前的解码器输出与上下文）连接起来，而这将是新的解码器输入。

谢谢。

Seq2Seq注意机制

0 个答案: