Question

最近，我正在尝试在学习领域进行一些实验。但是，我发现Tensorflow中有两个函数可以帮助我计算梯度。有时他们可以给我相同的答案，但有时却不能。我不知道发生这种情况的原因。

以下是我的函数，它具有显式方程。

def function_1(x):
    return tf.sin(x) * x + tf.cos(x) * tf.exp(x)

然后，我使用以下两个函数计算梯度并获得相同的结果。

x = tf.Variables(2.0, name='x')
gradient_1 = tf.gradients(function_1(x), [x])
gradient_2 = tf.train.AdamOptimizer().compute_gradients(function_1(x), var_list=[x])

但是，当我尝试同时使用它们来计算没有显式方程的函数的梯度时，它们提供了我不同的答案。例如，我从高斯过程中采样函数，其细节如下。

def expectation(x):
    point = np.reshape(np.linspace(-5.0, 5.0, 300), (300, 1))
    kernel_matrix_np = np.exp(-(point - np.transpose(point))**2 / (2 * 1.5**2))

    def reference_point():
        covariance = kernel_matrix_np
        np.random.seed(100)
        sampled_funcs = np.random.multivariate_normal(np.ones(len(point)), covariance)
        return sampled_funcs
    ref_point = tf.transpose(tf.convert_to_tensor(reference_point(), dtype=tf.float32))
    point = tf.reshape(tf.linspace(-5.0, 5.0, 300), (300, 1))
    kernel_matrix_tf = tf.exp(-(point - tf.transpose(point))**2 / (2 * 1.5**2))
    inverse_kernel_matrix = tf.matrix_inverse(kernel_matrix_tf)
    kernel_vector = tf.exp(-(x - tf.transpose(point))**2 / (2 * 1.5**2))
    mu = tf.matmul(kernel_vector, inverse_kernel_matrix)
    ref_point = tf.expand_dims(ref_point, axis=1)
    mu = tf.matmul(mu, ref_point)
    return mu

x = tf.Variable(2.0, name='x')
gradients_1 = tf.train.AdamOptimizer().compute_gradients(expectation(x), var_list=[x])
graidents_2 = tf.gradients(expectation(x), [x])

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print('The gradient one is', sess.run(gradients_1))
    print('The gradient two is', sess.run(graidents_2))

结果如下。

The gradient one is [(-24.727448, 2.0)]
The gradient two is [-27.727448]

我不确定问题出在哪里以及两个功能如何工作。非常感谢！

Answer 1

以完全相同的方式计算梯度。您遇到精度问题，可能是由于您的求幂+矩阵求逆。

这里使用的是float32而不是def expectation(x): point = np.reshape(np.linspace(-5.0, 5.0, 300), (300, 1)) kernel_matrix_np = np.exp(-(point - np.transpose(point))**2 / (2 * 1.5**2)) def reference_point(): covariance = kernel_matrix_np np.random.seed(100) sampled_funcs = np.random.multivariate_normal(np.ones(len(point)), covariance) return sampled_funcs ref_point = tf.transpose(tf.convert_to_tensor(reference_point(), dtype=tf.float64)) # <--- five64 = tf.convert_to_tensor(5.0, dtype=tf.float64) # <--- point = tf.reshape(tf.linspace(-five64, 5.0, 300), (300, 1)) # <--- kernel_matrix_tf = tf.exp(-(point - tf.transpose(point))**2 / (2 * 1.5**2)) inverse_kernel_matrix = tf.matrix_inverse(kernel_matrix_tf) kernel_vector = tf.exp(-(x - tf.transpose(point))**2 / (2 * 1.5**2)) mu = tf.matmul(kernel_vector, inverse_kernel_matrix) ref_point = tf.expand_dims(ref_point, axis=1) mu = tf.matmul(mu, ref_point) return mu x = tf.Variable(2.0, dtype=tf.float64, name='x') # <--- gradients_1 = tf.train.AdamOptimizer().compute_gradients(expectation(x), var_list=[x]) graidents_2 = tf.gradients(expectation(x), [x]) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) print('The gradient one is', sess.run(gradients_1)) print('The gradient two is', sess.run(graidents_2))的版本，其渐变确实相同：

>>> The gradient one is [(-21.5, 2.0)]
>>> The gradient two is [-21.5]

（我更改的行用花哨的箭头注释标记）

我的输出：

compute_gradients

请注意，来自优化器的x也会返回error_reporting(0);本身的值。这就是为什么有一个元组而不是仅包含渐变（第一个值）的原因。

Tensorflow中的计算梯度

1 个答案: