使用Tensorflow的线性回归预制估算器得出错误的答案

时间:2018-07-12 09:41:39

标签: python tensorflow machine-learning linear-regression

我是堆栈溢出和张量流的新手。我试图使用预制的线性回归估计器重做“机器学习概论”(Andrew Ng的Coursera课程)中的简单线性回归。

我已经使用numpy和scikit-learn在python中编码了线性回归模型,并成功找到模型参数[theta0,theta1] = [-3.6303,1.1664]。这是通过法线方程和规则梯度下降完成的。

我无法使用Tensorflow的线性回归预制估算器来产生相同的结果。我正在使用Google机器学习速成课程-TensorFlow的第一步(以及此处:https://medium.com/datadriveninvestor/machine-learning-part-iv-efecd2f61f35)中确定的基本方法。

我将数据放在这里:https://github.com/ChristianHaeuber/TensorFlowData

有人可以告诉我我在做什么错吗?

from __future__ import print_function

import math

from IPython import display
from matplotlib import cm
from matplotlib import gridspec
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import metrics
import tensorflow as tf
from tensorflow.python.data import Dataset

tf.logging.set_verbosity(tf.logging.ERROR)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

data = pd.read_csv('ex1data1.txt')

batch = data.shape[0]

feature_columns = [tf.feature_column.numeric_column('population')]

targets = data['profit']

my_optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.01)

linear_regressor = tf.estimator.LinearRegressor(
        feature_columns=feature_columns,
        optimizer=my_optimizer
        )

def input_fn(ft, t, batch=1, shuffle=True, epochs=None):
    ft = {k:np.array(v) for k,v in dict(ft).items()}
    ds = Dataset.from_tensor_slices((ft, t))
    ds = ds.batch(batch).repeat(epochs)

    if shuffle:
        ds=ds.shuffle(buffer_size=10000)

    ft, lb = ds.make_one_shot_iterator().get_next()

    return ft, lb

ft = data[['population']]
input_fn_1 = lambda: input_fn(ft, targets)

linear_regressor.train(
        input_fn = input_fn_1,
        steps=1
        )

input_fn_2 = lambda: input_fn(ft, targets, shuffle=False, epochs=1)

p = linear_regressor.predict(input_fn = input_fn_2)

p = np.array([item['predictions'][0] for item in p])

mse = metrics.mean_squared_error(p, targets)

print("MSE: %0.3f" % mse)

print("Bias Weight: %0.3f" % 
      linear_regressor.get_variable_value('linear/linear_model/bias_weights').flatten())
print("Weight %0.3f" % 
      linear_regressor.get_variable_value('linear/linear_model/population/weights').flatten())

1 个答案:

答案 0 :(得分:0)

“机器学习入门”课程在每次迭代中使用所有训练示例进行了批量梯度下降,然后使用多次迭代进行收敛。上面的代码仅使用一个训练示例(batch = 1),并且迭代次数(步骤)是永远的(基于tf.estimator.LinearRegressor.train文档)。

我可以进行一些更改来复制机器学习入门课程的结果。

from __future__ import print_function

import math
from matplotlib import cm
from matplotlib import gridspec
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import metrics
import tensorflow as tf
from tensorflow.python.data import Dataset

tf.logging.set_verbosity(tf.logging.ERROR)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

def my_input_fn(features, labels, batch_size=1, num_epochs=None):

    features = {key:np.array(value) for key,value in         
                dict(features).items()}

    ds = Dataset.from_tensor_slices((features,labels))
    ds = ds.batch(batch_size).repeat(num_epochs)

    features, labels = ds.make_one_shot_iterator().get_next()

    return features, labels

ex1_data_df = pd.read_csv('ex1data1.txt')

features = ex1_data_df['population']
my_features = ex1_data_df[['population']]
feature_columns = [tf.feature_column.numeric_column('population')]
labels = ex1_data_df['profit']

my_optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.0001)

linear_regressor = tf.estimator.LinearRegressor(
        feature_columns = feature_columns,
        optimizer=my_optimizer)

_ = linear_regressor.train(
        input_fn = lambda:my_input_fn(my_features, labels, 
                                      batch_size=ex1_data_df.shape[0]), 
        steps=2000
        )

predictions = linear_regressor.predict(
        input_fn=lambda:my_input_fn(my_features,labels,
                                    batch_size=1,num_epochs=1)
        )

predictions = np.array([item['predictions'][0] for item in     predictions])

mean_squared_error = metrics.mean_squared_error(predictions, labels)
print("Mean Squared Error (on training data):     {}".format(mean_squared_error))

weight =     linear_regressor.get_variable_value('linear/linear_model/population/weights')
bias = linear_regressor.get_variable_value('linear/linear_model/bias_weights')
print("Feature weight: {0}\t Bias weight: {1}".format(weight, bias))