Question

我试图了解Tensorflow如何通过函数（具体为tf.tile）传播梯度。我想做的很简单：

""" Considering batch_size = 3: """

# CASE 1:

a = tf.constant([[[1., 1.5], [2., 2.5], [3., 3.5]], [[4, 4.5], [5, 5.5], [6, 6.5]], [[7, 7.5], [8., 8.5], [9., 9.5]]])  # shape [3, 3, 2]

b = tf.constant([[[0.1], [0.2], [0.3]], [[0.4], [0.5], [0.6]], [[0.7], [0.8], [0.9]]])  # shape [3, 3, 1]

""" What I want is c = a + b, so I tile b to match last dimension of a: """

b_tile = tf.tile(b, [1, 1, tf.shape(a)[-1]])  # shape [3, 3, 2]
c = a + b_tile  # shape [3, 3, 2]

现在让我们计算c相对于a和b_tile的梯度：

dc_da, dc_dbtile = tf.gradients(c, [a, b_tile])
# dc_da: tensor with same shape as a and all elements 1.
# dc_dbtile: tensor with same shape as b and all elements 1.

这完全有意义，因为我有一个线性关系：c = a + b_tile。但是，当我计算b_tile与b的梯度时，我也希望该梯度也为1.，因为我没有更改b的内部值，只需展开尺寸。但是，当我这样做时，结果为2.：

dbtile_db  = tf.gradients(b_tile, [b])
# dbtile_db: tensor with same shape as b and all elements equal 2.

我还尝试过两次串联张量b而不是使用tf.tile，它给出的结果相同，因此很明显我遗漏了一些东西。

我尝试了另一个示例，以检查其来源。现在让我们考虑另一个张量t，其值与b完全相同，然后将它们串联起来以获得我需要的张量。在这种情况下：

# CASE 2:

m = tf.concat([b, t], -1) # shape [3, 3, 2]. Tensor identical to b_tile
dm_db, dm_dt = tf.gradients(m, [b, t])
# dm_db: tensor with same shape as b and all elements equal 1.
# dm_dt: tensor with same shape as t and all elements equal 1.

所以我有几个问题：

1）我不理解的是为什么使用tf.tile时梯度会在所有副本中传播，而它们只是能够计算总和的工具。从数字上讲，这对我来说没有意义，我希望像上一种情况一样，渐变会传播。

2）情况1和情况2有什么区别（在梯度方面）？当我尝试做的相同时，为什么它应该产生不同的结果？

3）考虑到我要计算总和c = a + b，正确的方法是什么？

问题在于，在示例中，我的最后一个维度是2，因此我可以手动进行（串联）购买。在我遇到的实际问题中，我的最后一个维度是256000，因此我无法做到最后一条路。预先谢谢你

为什么“ tf.tile”会在所有副本中传播梯度？

0 个答案: