以下代码用于检查张量流中CNN的conv2d的内核(过滤器)权重的反向传播过程。 首先,我得到了conv2d的输出,以检查输入的哪些元素用于产生输出。 使用3X3输入,1X1内核和步幅3调用tf.conv2d。 接下来,假设conv输出的梯度为[[100]],我使用tf.nn.conv2d_backprop_filter函数得到了内核梯度。
import tensorflow as tf
import numpy as np
x = tf.placeholder("float", [1,3,3,1])
k = tf.placeholder("float", [1,1,1,1])
g = tf.placeholder("float", [1,1,1,1])
tfconv = tf.nn.conv2d(x, k, strides=[1,3,3,1], padding='SAME', data_format='NHWC')
tfgrad = tf.nn.conv2d_backprop_filter(x, [1,1,1,1], g, strides=[1,3,3,1], padding='SAME', data_format='NHWC')
input = np.array([1,2,3,4,5,6,7,8,0]).reshape(1,3,3,1)
kernel = np.array([10]).reshape(1,1,1,1)
gradient = np.array([100]).reshape(1,1,1,1)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
conv_tf = sess.run(tfconv, feed_dict={x:input, k:kernel})
grad_tf = sess.run(tfgrad, feed_dict={x:input, g:gradient})
print("inp = {}".format(input[0,:,:,0]))
print("kernel = {}".format(kernel[:,:,0,0]))
print("conv_tf = {}".format(conv_tf[0,:,:,0]))
print("gradient = {}".format(gradient[0,:,:,0]))
print("grad_tf = {}".format(grad_tf[0,:,:,0]))
输出如下所示。
inp = [[1 2 3]
[4 5 6]
[7 8 0]]
kernel = [[10]]
conv_tf = [[50.]]
gradient = [[100]]
grad_tf = [[100.]]
在这个测试中,我们可以发现grad_tf是使用位置(0,0)的输入元素1计算的,而不是在conv2d评估中使用的位置(1,1)处的5。 我猜这是张量流的错误。 当然,这可能是我的错误。 你能让我知道这个问题的原因是什么吗?
我发现这个问题可能会使CNN的训练过程非常混乱,如下面的情况,
import tensorflow as tf
import numpy as np
x = tf.placeholder("float", [1,3,3,1])
k_value = [10]
k_init = tf.constant_initializer(k_value)
k = tf.get_variable('k', shape=[1,1,1,1], initializer=k_init)
tfconv = tf.nn.conv2d(x, k, strides=[1,3,3,1], padding='SAME', data_format='NHWC')
cost = tf.reduce_sum(tf.square(tfconv))
train_op = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(cost)
ocg = tf.gradients(cost, tfconv)
wcg = tf.gradients(cost, k)
wog = tf.gradients(tfconv, k)
input = np.array([1,2,3,4,5,6,7,8,0], dtype="float32").reshape(1,3,3,1)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
kv = sess.run(k)
print("kernel before training = {}".format(kv[:,:,0,0]))
cv, xv, kv, fv, g1, g2, g3, _ = sess.run([cost, x, k, tfconv, ocg, wcg, wog, train_op], feed_dict={x:input})
print("cost = {}".format(cv))
print("x = {}".format(xv[0,:,:,0]))
print("conv = {}".format(fv[0,:,:,0]))
print("conv-cost gradient = {}".format(np.asarray(g1)))
print("kernel-cost gradient = {}".format(np.asarray(g2)))
print("conv-cost gradient = {}".format(np.asarray(g3)))
kv = sess.run(k)
print("kernel after training = {}".format(kv[:,:,0,0]))
以下是输出,重要的是内核权重实际上是根据错误的输入值而改变的,而输入值在conv2d操作中没有使用,而是在正确的操作中使用。
kernel before training = [[ 10.]]
cost = 2500.0
x = [[ 1. 2. 3.]
[ 4. 5. 6.]
[ 7. 8. 0.]]
conv = [[ 50.]]
conv-cost gradient = [[[[[ 100.]]]]]
kernel-cost gradient = [[[[[ 100.]]]]]
conv-cost gradient = [[[[[ 1.]]]]]
kernel after training = [[ 9.]]