使用scatter_nd将注意力分布投射到另一个分布上,实质上创建一个引用词汇表的分布。
indices = tf.stack((batch_nums, encoder_batch), axis=2)
shape = [batch_size, vocab_size]
attn_dists_projected = [tf.scatter_nd(indices, copy_distribution, shape) for copy_distribution in attn_dists]
尝试使用大小未定义的占位符运行此操作时,我遇到了以下错误:
ValueError: The inner 0 dimensions of output.shape=[?,?] must match the inner 1
dimensions of updates.shape=[128,128,?]: Shapes must be equal rank, but are 0 and 1
for 'final_distribution/ScatterNd' (op: 'ScatterNd') with input shapes:
[128,?,2], [128,128,?], [2].
这在seq2seq的上下文中,因此模型占位符的形状需要部分未定义。此外,我的数据批量大小不一致,这也需要不同的批量大小。