具有张量流的线性回归

时间:2017-04-02 15:19:48

标签: python tensorflow linear-regression prediction

我试图理解线性回归......这是我试图理解的脚本:

'''
A linear regression learning algorithm example using TensorFlow library.
Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
'''

from __future__ import print_function

import tensorflow as tf
from numpy import *
import numpy
import matplotlib.pyplot as plt
rng = numpy.random

# Parameters
learning_rate = 0.0001
training_epochs = 1000
display_step = 50

# Training Data
train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                         7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                         2.827,3.465,1.65,2.904,2.42,2.94,1.3])

train_X=numpy.asarray(train_X)
train_Y=numpy.asarray(train_Y)
n_samples = train_X.shape[0]


# tf Graph Input
X = tf.placeholder("float")
Y = tf.placeholder("float")

# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")

# Construct a linear model
pred = tf.add(tf.multiply(X, W), b)


# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    # Fit all training data
    for epoch in range(training_epochs):
        for (x, y) in zip(train_X, train_Y):
            sess.run(optimizer, feed_dict={X: x, Y: y})

        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            c = sess.run(cost, feed_dict={X: train_X, Y:train_Y})
            print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
                "W=", sess.run(W), "b=", sess.run(b))

    print("Optimization Finished!")
    training_cost = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
    print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')

    # Graphic display
    plt.plot(train_X, train_Y, 'ro', label='Original data')
    plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fitted line')
    plt.legend()
    plt.show()

问题是这部分代表的内容:

# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")

为什么有随机浮点数?

你还能告诉我一些数学与形式代表成本,预测,优化器变量吗?

3 个答案:

答案 0 :(得分:8)

让我们试着用tf方法提出一些直觉和来源。

一般直觉:

此处显示的回归是监督学习问题。在其中,正如Russel& Norvig的Artificial Intelligence中所定义的那样,任务是:

  

给出训练集 (X, y) m输入 - 输出对(x1, y1), (x2, y2), ... , (xm, ym),其中每个输出都是由未知函数y = f(x)生成的,发现一个近似于真实函数的函数h f

为此,h 假设函数以某种方式将每个x与待学习参数组合在一起,以便使输出尽可能接近尽可能相应的y,这对整个数据集来说都是如此。希望得到的函数接近f

但是如何学习这个参数呢? in order to be able to learn, the model has to be able to evaluate。这里有 cost (也称为 loss energy merit ...)函数:它是一个metric function,用于将h的输出与相应的y进行比较,惩罚较大的差异

现在应该清楚什么是"学习"此处处理:更改参数以实现成本函数的较低值

线性回归:

您要发布的示例执行参数线性回归,基于均方误差作为成本函数,使用渐变下降进行优化。这意味着:

  • 参数:参数集已修复。它们通过学习过程保存在完全相同的内存占位符中。

  • 线性h的输出只是输入x与您之间的线性(实际上是仿射)组合参数。因此,如果xw是具有相同维度的实值向量,并且b是实数,则它保持h(x,w, b)= w.transposed()*x+bDeep Learning Book的第107页为此提供了更多质量见解和直觉。

  • 成本函数:现在这是有趣的部分。平均误差是函数。这意味着它具有单一的全局最优,此外,它可以直接找到正规方程组(也在DLB中解释)。在您的示例中,使用随机(和/或小批量)梯度下降方法:这是优化非凸成本函数(在神经网络等更高级模型中就是这种情况)或数据集时的首选方法具有巨大的维度(也在DLB中有解释)。

  • Gradient descent tf为您处理此问题,因此足以说GD通过遵循其衍生产品&#34;向下&#34来最小化成本函数;,小步骤,直到达到鞍点。 如果您完全需要知道,TF应用的确切技术称为automatic differentiation,这是数字和符号方法之间的妥协。对于像你这样的凸函数,这一点将是全局最优,并且(如果你的学习速度不是太大)它总是会收敛到它,所以你用<来初始化你的变量的值并不重要。 / strong>即可。随机初始化在更复杂的架构(如神经网络)中是必需的。关于 minibatches 的管理有一些额外的代码,但我不会进入,因为它不是你问题的主要焦点。

TensorFlow方法:

深度学习框架如今通过构建计算图来嵌套大量函数(您可能想看看我几周前做过的presentation on DL frameworks)。为了构建和运行图形,TensoFlow遵循declarative style,这意味着在部署和执行之前必须首先完全定义和编译图形。如果您还没有,请阅读this简短的wiki文章,这是非常值得推荐的。在此上下文中,设置分为两部分:

  1. 首先,您定义计算Graph,将数据集和参数放在内存占位符中,定义基于它们的假设和成本函数,并告诉tf哪种优化技术应用。

  2. 然后在Session中运行计算,库将能够(重新)加载数据占位符并执行优化。

  3. 代码:

    该示例的代码紧随此方法:

    1. 定义测试数据X并标记Y,并在图表中为它们准备占位符(在feed_dict部分中提供)。

    2. 定义&#39; W&#39;和&#39; b&#39;参数的占位符。它们必须是Variables,因为它们将在会话期间更新。

    3. 如前所述,定义pred(我们的假设)和cost

    4. 由此,其余代码应该更清晰。关于优化器,正如我所说,tf已经知道如何处理这个问题,但你可能想要了解梯度下降以获取更多细节(同样,DLB是一个非常好的参考)

      干杯! 安德烈

      代码示例:GRADIENT DESCENT VS.正态方程

      这个小片段生成简单的多维数据集并测试这两种方法。请注意,正规方程方法不需要循环,并且可以带来更好的结果。对于小维度(DIMENSIONS <30k)可能是首选方法:

      from __future__ import absolute_import, division, print_function
      import numpy as np
      import tensorflow as tf
      
      ####################################################################################################
      ### GLOBALS
      ####################################################################################################
      DIMENSIONS = 5
      f = lambda(x): sum(x) # the "true" function: f = 0 + 1*x1 + 1*x2 + 1*x3 ...
      noise = lambda: np.random.normal(0,10) # some noise
      
      ####################################################################################################
      ### GRADIENT DESCENT APPROACH
      ####################################################################################################
      # dataset globals
      DS_SIZE = 5000
      TRAIN_RATIO = 0.6 # 60% of the dataset is used for training
      _train_size = int(DS_SIZE*TRAIN_RATIO)
      _test_size = DS_SIZE - _train_size
      ALPHA = 1e-8 # learning rate
      LAMBDA = 0.5 # L2 regularization factor
      TRAINING_STEPS = 1000
      
      # generate the dataset, the labels and split into train/test
      ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)] # synthesize data
      # ds = normalize_data(ds)
      ds = [(x, [f(x)+noise()]) for x in ds] # add labels
      np.random.shuffle(ds)
      train_data, train_labels = zip(*ds[0:_train_size])
      test_data, test_labels = zip(*ds[_train_size:])
      
      # define the computational graph
      graph = tf.Graph()
      with graph.as_default():
        # declare graph inputs
        x_train = tf.placeholder(tf.float32, shape=(_train_size, DIMENSIONS))
        y_train = tf.placeholder(tf.float32, shape=(_train_size, 1))
        x_test = tf.placeholder(tf.float32, shape=(_test_size, DIMENSIONS))
        y_test = tf.placeholder(tf.float32, shape=(_test_size, 1))
        theta = tf.Variable([[0.0] for _ in range(DIMENSIONS)])
        theta_0 = tf.Variable([[0.0]]) # don't forget the bias term!
        # forward propagation
        train_prediction = tf.matmul(x_train, theta)+theta_0
        test_prediction  = tf.matmul(x_test, theta) +theta_0
        # cost function and optimizer
        train_cost = (tf.nn.l2_loss(train_prediction - y_train)+LAMBDA*tf.nn.l2_loss(theta))/float(_train_size)
        optimizer = tf.train.GradientDescentOptimizer(ALPHA).minimize(train_cost)
        # test results
        test_cost = (tf.nn.l2_loss(test_prediction - y_test)+LAMBDA*tf.nn.l2_loss(theta))/float(_test_size)
      
      # run the computation
      with tf.Session(graph=graph) as s:
        tf.initialize_all_variables().run()
        print("initialized"); print(theta.eval())
        for step in range(TRAINING_STEPS):
          _, train_c, test_c = s.run([optimizer, train_cost, test_cost],
                                     feed_dict={x_train: train_data, y_train: train_labels,
                                                x_test: test_data, y_test: test_labels })
          if (step%100==0):
            # it should return bias close to zero and parameters all close to 1 (see definition of f)
            print("\nAfter", step, "iterations:")
            #print("   Bias =", theta_0.eval(), ", Weights = ", theta.eval())
            print("   train cost =", train_c); print("   test cost =", test_c)
        PARAMETERS_GRADDESC = tf.concat(0, [theta_0, theta]).eval()
        print("Solution for parameters:\n", PARAMETERS_GRADDESC)
      
      ####################################################################################################
      ### NORMAL EQUATIONS APPROACH
      ####################################################################################################
      # dataset globals
      DIMENSIONS = 5
      DS_SIZE = 5000
      TRAIN_RATIO = 0.6 # 60% of the dataset isused for training
      _train_size = int(DS_SIZE*TRAIN_RATIO)
      _test_size = DS_SIZE - _train_size
      f = lambda(x): sum(x) # the "true" function: f = 0 + 1*x1 + 1*x2 + 1*x3 ...
      noise = lambda: np.random.normal(0,10) # some noise
      # training globals
      LAMBDA = 1e6 # L2 regularization factor
      
      # generate the dataset, the labels and split into train/test
      ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)]
      ds = [([1]+x, [f(x)+noise()]) for x in ds] # add x[0]=1 dimension and labels
      np.random.shuffle(ds)
      train_data, train_labels = zip(*ds[0:_train_size])
      test_data, test_labels = zip(*ds[_train_size:])
      
      # define the computational graph
      graph = tf.Graph()
      with graph.as_default():
        # declare graph inputs
        x_train = tf.placeholder(tf.float32, shape=(_train_size, DIMENSIONS+1))
        y_train = tf.placeholder(tf.float32, shape=(_train_size, 1))
        theta = tf.Variable([[0.0] for _ in range(DIMENSIONS+1)]) # implicit bias!
        # optimum
        optimum = tf.matrix_solve_ls(x_train, y_train, LAMBDA, fast=True)
      
      # run the computation: no loop needed!
      with tf.Session(graph=graph) as s:
        tf.initialize_all_variables().run()
        print("initialized")
        opt = s.run(optimum, feed_dict={x_train:train_data, y_train:train_labels})
        PARAMETERS_NORMEQ = opt
        print("Solution for parameters:\n",PARAMETERS_NORMEQ)
      
      ####################################################################################################
      ### PREDICTION AND ERROR RATE
      ####################################################################################################
      
      # generate test dataset
      ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)]
      ds = [([1]+x, [f(x)+noise()]) for x in ds] # add x[0]=1 dimension and labels
      test_data, test_labels = zip(*ds)
      # define hypothesis
      h_gd = lambda(x): PARAMETERS_GRADDESC.T.dot(x)
      h_ne = lambda(x): PARAMETERS_NORMEQ.T.dot(x)
      # define cost
      mse = lambda pred, lab: ((pred-np.array(lab))**2).sum()/DS_SIZE
      # make predictions!
      predictions_gd = np.array([h_gd(x) for x in test_data])
      predictions_ne = np.array([h_ne(x) for x in test_data])
      # calculate and print total error
      cost_gd = mse(predictions_gd, test_labels)
      cost_ne = mse(predictions_ne, test_labels)
      print("total cost with gradient descent:", cost_gd)
      print("total cost with normal equations:", cost_ne)
      

答案 1 :(得分:0)

  

变量允许我们将可训练参数添加到图形中。它们使用类型和初始值构建:

W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)
x = tf.placeholder(tf.float32)
linear_model = W * x + b

类型为tf.Variable的变量是我们将学习使用TensorFlow的参数。假设您使用gradient descent来最小化损失函数。您首先需要初始化这些参数。 rng.randn()用于为此目的生成随机值。

我认为Getting Started With TensorFlow对你来说是一个很好的起点。

答案 2 :(得分:0)

我首先要定义变量:

W is a multidimensional line that spans R^d (same dimensionality as X)
b is a scalar value (bias) 
Y is also a scalar value i.e. the value at X

pred = W (dot) X + b   # dot here refers to dot product

# cost equals the average squared error
cost = ((pred - Y)^2) / 2*num_samples

#finally optimizer
# optimizer computes the gradient with respect to each variable and the update

W += learning_rate * (pred - Y)/num_samples * X 
b += learning_rate * (pred - Y)/num_samples 

为什么W和b设置为随机,这是基于从成本计算的误差的梯度更新,因此W和b可以初始化为任何东西。它没有通过最小二乘法进行线性回归,尽管两者都会收敛到相同的解决方案。

点击此处了解更多信息:Getting Started