Question

我是Tensorflow的新手。我使用普通的GD优化算法进行了简单的多元回归。但是，即使使用相同的初始猜测值应用两个不同的变量定义，我也会得到完全不同的结果。

这两个计算之间有什么区别？

当我定义变量时：

tau = tf.Variable([0.25, 0.25, 0.25, 0.25], name='parameter', dtype=tf.float64)
tau = tf.clip_by_value(tau, 0.1, 5.)

在10000个时间段后，我得到的结果如下。

tau= [0.28396885 0.24675105 0.26584612 1.37071573]

但是，当我将它们定义为归一化值时：

tau_norm = tf.Variable([0.025, 0.025, 0.025, 0.025], name='parameter', dtype=tf.float64)
tau_norm = tf.clip_by_value(tau_norm, 0.01, 0.5)
tau_max = 10
tau = tau_norm*tau_max

在相同的10000个时间段后，我得到了完全不同的结果：

tau= [ nan 0.22451382 2.70862284 1.46199275]

我希望由于相同的初始猜测，这两个计算将得出相同（或足够相似）的结果。但是，那不是我所看到的。我想知道是什么导致了这种差异。

在这里，我使用tensorflow-gpu 1.14.0，但是在以下情况下，GPU不用于此计算：

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="-1"

已更新

好的，让我用一个示例来说明一下，其代码改编自here。我认为我所看到的基本上与下面相同。

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="-1"
import tensorflow as tf
import numpy as np

x = tf.placeholder("float")
y = tf.placeholder("float")
w = tf.Variable([1.0, 2.0], name="w")

y_model = tf.multiply(x, w[0]) + w[1]
error = tf.square(y - y_model)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(error)

model = tf.global_variables_initializer()

with tf.Session() as session:
    session.run(model)
    print("Initial guess: ", session.run(w))
    np.random.seed(seed=100)
    for i in range(1000):
        x_value = np.random.rand()
        y_value = x_value * 2 + 6
        session.run(train_op, feed_dict={x: x_value, y: y_value})

    w_value = session.run(w)
    print("Predicted model: {a:.3f}x + {b:.3f}".format(a=w_value[0], b=w_value[1]))

从代码中，我得到了Predicted model: 2.221x + 5.882。但是，当我将w替换为

w_norm = tf.Variable([0.5, 1.0], name = 'w_norm')
w = w_norm*2.0

结果是Predicted model: 2.004x + 5.998，即使它具有相同的初始猜测（[1. 2.]）和相同的时期数。我不知道是什么使这种差异。

Answer 1

产生这种差异的原因是GradientDescentOptimizer.minimize将针对tf.Variables进行优化，因此您的梯度下降将不会应用于相同的方程式。

一次，您将(y - (x*w[0] + w[1])中的参数的误差w最小化，而另一次(y - (x*2*w[0] + 2*w[1])也将误差w的误差最小化。

如果在代码中更改学习率，则算法最终将获得相同的结果。要考虑误差的平方（以范数的平方作为误差），如果在train_op = tf.train.GradientDescentOptimizer(0.04).minimize(error)中将比率设置为0.04而不是0.01，则结果应相同。

所以：

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="-1"
import tensorflow as tf
import numpy as np

x = tf.placeholder("float")
y = tf.placeholder("float")
w = tf.Variable([1.0, 2.0], name="w")

y_model = tf.multiply(x, w[0]) + w[1]
error = tf.square(y - y_model)
train_op = tf.train.GradientDescentOptimizer(0.04).minimize(error)

model = tf.global_variables_initializer()

with tf.Session() as session:
    session.run(model)
    print("Initial guess: ", session.run(w))
    np.random.seed(seed=100)
    for i in range(1000):
        x_value = np.random.rand()
        y_value = x_value * 2 + 6
        session.run(train_op, feed_dict={x: x_value, y: y_value})

    w_value = session.run(w)
    print("Predicted model: {a:.3f}x + {b:.3f}".format(a=w_value[0], b=w_value[1]))

打印与

相同的结果

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="-1"
import tensorflow as tf
import numpy as np

x = tf.placeholder("float")
y = tf.placeholder("float")
w_norm = tf.Variable([0.5, 1.0], name = 'w_norm')
w = w_norm*2.0

y_model = tf.multiply(x, w[0]) + w[1]
error = tf.square(y - y_model)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(error)

model = tf.global_variables_initializer()

with tf.Session() as session:
    session.run(model)
    print("Initial guess: ", session.run(w))
    np.random.seed(seed=100)
    for i in range(1000):
        x_value = np.random.rand()
        y_value = x_value * 2 + 6
        session.run(train_op, feed_dict={x: x_value, y: y_value})

    w_value = session.run(w)
    print("Predicted model: {a:.3f}x + {b:.3f}".format(a=w_value[0], b=w_value[1]))

Tensorflow：来自相同初始猜测的完全不同的结果

已更新

1 个答案: