Question

我正在尝试实现分层softmax，而我在张量流中遇到gather的一些性能问题，无论是来自meomory还是速度。这是一个例子：

# weights_word: [nClusters, hiddenSize, max_cluster_size]
weights_word = tf.Variable(tf.truncated_normal([self.textData.nClusters, self.args.hiddenSize, self.textData.max_cluster_size], stddev=0.5, dtype=self.dtype), name='weights_word', dtype=self.dtype)

# target_cluster: [batchSize, maxSteps]
# weights_gathered: [batchSize, maxSteps, hiddenSize, max_cluster_size]
weights_gathered = tf.gather(weights_word, self.target_cluster, name='weights_gathered')

我需要在分层softmax的第二级收集不同类词的权重。但gather操作似乎非常昂贵。它会创建一个临时变量weights_gathered，甚至比原始weights_word消耗更多内存。此外，当使用时间轴来分析我的程序时，我发现gather非常耗时。我想知道是否有任何方法可以在速度和记忆方面对其进行优化。

在TensorFlow中收集的内存和速度性能

0 个答案: