训练一个简单的网络似乎不会多次改变变量的值

时间:2016-06-27 20:11:34

标签: tensorflow

我确定我错过了一些明显的东西。这是我的代码的尾部:

# simple loss function
loss = tf.reduce_sum(tf.abs(tf.sub(x4, yn)))

train_step = tf.train.GradientDescentOptimizer(0.000001).minimize(loss)

with tf.Session() as sess:
    tf.initialize_all_variables().run()
    print(sess.run([tf.reduce_sum(w1), tf.reduce_sum(b1)]))
    for i in range(5):
        # fill in x1 and yn
        sess.run(train_step, feed_dict={x1: in_images, yn: out_images})
        print(sess.run([tf.reduce_sum(w1), tf.reduce_sum(b1)]))

从损失函数下降的网络是一个简单的CNN,包含conv2d和bias_adds,以及删除。我想看看第一层的权重和偏差是如何变化的。第一个打印返回预期值([+/- 100左右,0]),因为w1用随机法线初始化,b1用零初始化。

第二个print语句按预期提供不同的值对。

不期望的是每次循环时,第二个print语句打印相同的值对,就好像每次调用train_step每次都做同样的事情,而不是更新变量的值在损失网络中。

我在这里缺少什么?

这里是一个有趣的部分的剪切和粘贴:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0)
[-50.281082, 0.0]
W tensorflow/core/common_runtime/bfc_allocator.cc:213] Ran out of memory trying to allocate 3.98GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
[112.52832, 0.078026593]
[112.52832, 0.078026593]
[112.52832, 0.078026593]
[112.52832, 0.078026593]
[112.52832, 0.078026593]

如果有必要,我可以发布网络本身,但我怀疑问题是我的心理模型如何更新tensorflow状态。

这是整个python程序,图像输入的虚拟例程显示问题:

import tensorflow as tf
import numpy as np
from scipy import misc

H = 128
W = 128

x1 = tf.placeholder(tf.float32, [None, H, W, 1], "input_image")
yn = tf.placeholder(tf.float32, [None, H-12, W-12, 1], "test_image")

w1 = tf.Variable(tf.random_normal([7, 7, 1, 64]))   # 7x7, 1 input chan, 64 output chans
b1 = tf.Variable(tf.constant(0.1, shape=[64]))
x2 = tf.nn.conv2d(x1, w1, [1,1,1,1], "VALID")
x2 = tf.nn.bias_add(x2, b1)
x2 = tf.nn.elu(x2)

w2 = tf.Variable(tf.random_normal([5, 5, 64, 32]))  # 5x5, 64 input 32 output chans
b2 = tf.Variable(tf.constant(0.1, shape=[32]))
x3 = tf.nn.conv2d(x2, w2, [1,1,1,1], "VALID")
x3 = tf.nn.bias_add(x3, b2)
x3 = tf.nn.elu(x3)

w3 = tf.Variable(tf.random_normal([3, 3, 32, 1]))
b3 = tf.Variable(tf.constant(0.1, shape=[1]))
x4 = tf.nn.conv2d(x3, w3, [1,1,1,1], "VALID")
x4 = tf.nn.bias_add(x4, b3)
x4 = tf.nn.elu(x4)

loss = tf.reduce_sum(tf.abs(tf.sub(x4, yn)))

train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)

# fake for testing
in_images = np.random.rand(20, 128, 128, 1)
out_images = np.random.rand(20, 116, 116, 1)

with tf.Session() as sess:
    tf.initialize_all_variables().run()
    print(sess.run([tf.reduce_mean(w1), tf.reduce_mean(b1)]))
    for i in range(5):
        # fill in x1 and yn
        sess.run(train_step, feed_dict={x1: in_images, yn: out_images})
        print(sess.run([tf.reduce_mean(w1), tf.reduce_mean(b1)]))

我看了一堆其他训练样例,但我还没有看到我做错了什么。更改学习速率只会改变打印的数字,但行为保持不变,运行优化器没有明显的变化。

1 个答案:

答案 0 :(得分:1)

错误是我计算损失函数的方式。我只是在批处理中添加了所有错误,而不是为每对图像取平均误差。以下损失函数

# simple loss function
diff_image = tf.abs(tf.sub(x4,yn))
# sum over all dimensions except batch dim
err_sum = tf.reduce_sum(diff_image, [1,2,3])
#take mean over batch
loss = tf.reduce_mean(err_sum)

实际上开始与AdamOptimizer融合。 GradientDescentOptimizer仍然展示了“仅更改一次”功能,我将把它视为一个bug并在github上发布。