为什么？

Question

我试图弄清楚为什么TensorFlow会做出令人惊讶的事情。我把它归结为一个测试用例，尝试对一个简单的问题进行线性回归，这个问题只是将两个输入加在一起。权重收敛到1.0，偏差收集到0.0。

使用此版本的培训输出：

train_y = [2., 3., 4.]

成本会收敛到0.0，但是使用此版本：

train_y = [[2.], [3.], [4.]]

成本收敛到4.0。如果第二个版本发出错误消息，我不会感到惊讶;令人惊讶的是，它默默地给出了错误的答案。为什么要这样做？

测试用例的完整代码：

import tensorflow as tf
sess = tf.InteractiveSession()
tf.set_random_seed(1)

# Parameters
epochs = 10000
learning_rate = 0.01

# Data
train_x = [[1., 1.], [1., 2.], [2., 2.]]

# It works with this version
train_y = [2., 3., 4.]

# But converges on cost 4.0 with this version
#train_y = [[2.], [3.], [4.]]

# Number of samples
n_samples = len(train_x)

# Inputs and outputs
x = tf.placeholder(tf.float32, name='x')
y = tf.placeholder(tf.float32, name='y')

# Weights
w = tf.Variable(tf.random_normal([2]), name='weight')
b = tf.Variable(tf.random_normal([]), name='bias')

# Model
pred = tf.tensordot(x, w, 1) + b
cost = tf.reduce_sum((pred-y)**2 / n_samples)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Train
tf.global_variables_initializer().run()
for epoch in range(epochs):
    # Print update at successive doublings of time
    if epoch&(epoch-1)==0 or epoch==epochs-1:
        print('{:6}'.format(epoch), end=' ')
        print('{:12.6f}'.format(cost.eval({x: train_x, y: train_y})), end=' ')
        print('    ['+', '.join('{:8.6f}'.format(z) for z in w.eval())+']', end=' ')
        print('{:12.6f}'.format(b.eval()))
    for (x1, y1) in zip(train_x, train_y):
        optimizer.run({x: x1, y: y1})

Answer 1

为什么？

问题是当您输入不同形状的张量时的成本函数计算。更具体地说，它是pred - y计算。

为了向您展示这个具体示例中出现的问题，同时避免混乱，我将使用上面提到的相同形状和值的常量：

y0 = tf.constant([2., 3., 4.])
y1 = tf.constant([[2.], [3.], [4.]])
pred = tf.constant([2., 3., 4.])

现在，让我们看看表达式pred - y0和pred - y1的形状：

res0 = pred - y0
res1 = pred - y1

print(res0.shape)
print(res1.shape)

输出结果为：

(3,)
(3, 3)

(3, 3)显示，在计算pred - y1形状(3,)和(3, 1)时，我们进行了广播(3, 3)。 这也意味着tf.reduce_sum()调用总计3x3 = 9个元素而不是3个。

您可以使用y1将(1, 3)转置为tf.transpose()来解决此问题：

res1_fixed = pred - tf.transpose(y1)
print(res1_fixed.shape)

输出现在是：

(1, 3)

如何修复：

现在，回到你的代码......只需更改以下表达式：

cost = tf.reduce_sum((pred-y)**2 / n_samples)

要：

cost = tf.reduce_sum((pred-tf.transpose(y))**2 / n_samples)

在这两种情况下，你都可以按预期收敛到零。

略有不同的形状会收敛到错误的数字 - 为什么？

1 个答案:

为什么？

如何修复：