我正在尝试使用tf.hessians函数获取Hessian矩阵。在每次训练后,损失值和变量都会更新,而Hessian矩阵值保持不变。而且,它们不依赖于可以手动设置的初始变量值。实际上,我的问题与this one类似,但尚未得到答案。这是我用于测试的代码:
import tensorflow as tf
# Model parameters
W = tf.Variable([.3], dtype=tf.float32)
b = tf.Variable([-.3], dtype=tf.float32)
# Model input and output
x = tf.placeholder(tf.float32)
linear_model = W*x + b
y = tf.placeholder(tf.float32)
# loss
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
# optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
# training data
x_train = [1, 2, 3, 4]
y_train = [0, -1, -2, -3]
hess = tf.hessians(loss, [W, b])
# training loop
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(10):
sess.run(train, {x: x_train, y: y_train})
cur_hess, curr_W, curr_b, curr_loss = sess.run([hess, W, b, loss], {x: x_train, y: y_train})
print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))
print('cur_hess', cur_hess)
以下是打印结果:
W: [-0.21999997] b: [-0.456] loss: 4.0181446
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.39679998] b: [-0.49552] loss: 1.8198745
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.459616] b: [-0.4965184] loss: 1.5448234
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.48454273] b: [-0.48487374] loss: 1.4825068
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.49684232] b: [-0.4691753] loss: 1.444397
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.5049019] b: [-0.45227283] loss: 1.409699
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.5115062] b: [-0.43511063] loss: 1.3761029
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.51758033] b: [-0.41800055] loss: 1.3433373
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.523432] b: [-0.40104443] loss: 1.3113549
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.52916396] b: [-0.38427448] loss: 1.2801344
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
所以,cur_hess没有更新,顺便说一下它只包含2个元素而不是4个。如何解决? 此外,我尝试按建议here两次应用tf.gradients,但是不会像tf.hessians那样更新值。同时,tf.gradients正确计算一阶导数,并在每次训练循环后更改它们。感谢。
答案 0 :(得分:2)
在这种情况下具有恒定的粗麻布是正常的,因为,
loss = Σ [(Wx + b - y)^2]
该方程为二次方程,二次方程的二阶导数为常数。
∂2(loss)/∂W2 = Σ 2x^2 = 2 * (1 + 4 + 9 + 16) = 60 ;(x = [1,2,3,4])
∂2(loss)/∂b2 = Σ 2 = 2 + 2 + 2 + 2 = 8 ;(4 samples with constant derivative)