SGD收敛但批量学习没有,张量流中的简单回归

时间:2016-11-03 13:59:52

标签: tensorflow

我遇到了一个问题,即张量流中的批量学习无法收敛到简单凸优化问题的正确解,而SGD收敛。下面是一个小例子,在Julia和python编程语言中,我已经验证了使用Julia和python的tensorflow会产生相同的确切行为。

我试图使用参数y = s*W + BW来拟合线性模型B 成本函数是二次的,因此问题是凸的,应该使用足够小的步长来轻松解决。如果我一次提供所有数据,最终结果只是预测y的平均值。但是,如果我当时输入一个数据点(julia版本中的注释代码),则优化会非常快速地收敛到正确的参数。

我还验证了由张量流计算的梯度在批处理示例和每个数据点的渐变总和之间有所不同。

关于我失败的地方的任何想法?

using TensorFlow

s = linspace(1,10,10)
s = [s reverse(s)]
y = s*[1,4] + 2
session = Session(Graph())

s_ = placeholder(Float32, shape=[-1,2])
y_ = placeholder(Float32, shape=[-1,1])

W = Variable(0.01randn(Float32, 2,1), name="weights1")
B = Variable(Float32(1), name="bias3")

q =    s_*W + B
loss = reduce_mean((y_ - q).^2)
train_step = train.minimize(train.AdamOptimizer(0.01), loss)

function train_critic(s,targets)
    for i = 1:1000
        # for i = 1:length(y)
        #     run(session, train_step, Dict(s_ => s[i,:]', y_ => targets[i]))
        # end
        ts = run(session, [loss,train_step], Dict(s_ => s, y_ => targets))[1]
        println(ts)
    end
    v = run(session, q, Dict(s_ => s, y_ => targets))
    plot(s[:,1],v, lab="v (Predicted value)")
    plot!(s[:,1],y, lab="y (Correct value)")
    gui();
end

run(session, initialize_all_variables())
train_critic(s,y)

python中的相同代码(我不是python用户所以这可能很难看)

import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import sklearn.datasets
import tensorflow as tf
from tensorflow.python.framework.ops import reset_default_graph

s = np.linspace(1,10,50).reshape((50,1))
s = np.concatenate((s,s[::-1]),axis=1).astype('float32')
y = np.add(np.matmul(s,[1,4]), 2).astype('float32')

reset_default_graph()
rng = np.random
s_ = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None])

weight_initializer = tf.truncated_normal_initializer(stddev=0.1)
with tf.variable_scope('model'): 
    W = tf.get_variable('W', [2, 1], 
                          initializer=weight_initializer)
    B = tf.get_variable('B', [1], 
                          initializer=tf.constant_initializer(0.0))

q = tf.matmul(s_, W) + B

loss = tf.reduce_mean(tf.square(tf.sub(y_ , q)))


optimizer = tf.train.AdamOptimizer(learning_rate=0.1)
train_op = optimizer.minimize(loss)

num_epochs = 200

train_cost= []
with tf.Session() as sess:
    init = tf.initialize_all_variables()
    sess.run(init)
    for e in range(num_epochs):
        feed_dict_train = {s_: s, y_: y}
        fetches_train = [train_op, loss]
        res = sess.run(fetches=fetches_train, feed_dict=feed_dict_train)
        train_cost = [res[1]]
        print train_cost

1 个答案:

答案 0 :(得分:0)

答案结果是,当我输入目标时,我输入了一个矢量而不是Nx1矩阵。操作y_-q然后变成广播操作,而不是返回元素差异,它返回一个NxN矩阵,沿对角线具有所需的差异。在朱莉娅,我通过修改行来解决这个问题

train_critic(s,y)train_critic(s,reshape(y, length(y),1))

确保y成为矩阵。

一个微妙的错误让我花了很长时间才找到!部分原因是TensorFlow似乎将向量视为行向量而不是像Julia那样的列向量,因此y_-q中的广播操作