使用张量流进行线性回归

时间:2021-05-28 06:57:52

标签: python pandas numpy tensorflow

我已关注Github link

在教程中,

<块引用>

在线性回归的情况下,假设是一条直线,即 h(x) = wx + b 其中 w 是一个称为权重的向量,b 是一个称为偏差的标量。权重和偏差称为模型的参数。 我们需要做的就是从给定的一组数据中估计 w 和 b 的值,使得由此产生的假设产生由以下成本函数定义的最小成本 J

enter image description here

其中 m 是给定数据集中的数据点数。此成本函数也称为均方误差。

我的 csv 文件如下:

Date,Prices,DateNumeric
30/09/20,83.75,1000
30/12/20,86.47,1120
01/02/21,89.21,1180
01/03/21,94.22,1210
01/04/21,93.59,1240
01/05/21,93.43,1270
07/05/21,94.3,1276
10/05/21,94.57,1279
11/05/21,94.85,1280
12/05/21,95.11,1281
14/05/21,95.41,1283
16/05/21,95.66,1285
18/05/21,95.94,1287
21/05/21,96.14,1290

我想预测商品的价格,即价格列。问题是日期是非线性的、非连续的且不是周期性的。所以,我已将其转换为 DateNumeric 列中的整数。

这里,30/09/20 的值取为 1000(初始值),30/12/20 取为 1120,因为它是在上一个日期之后 120 天(3 个月)。

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

learning_rate = 0.01 
epochs = 200
n_sample = 30

h = pd.read_csv('untitled.csv')
#h.shape
h.head(10)

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

x_train, x_test , y_train, y_test = train_test_split(h.Prices, h.DateNumeric, test_size = 0.2)
print(x_train)

#plt.plot(x_train, y_train)
#plt.plot(h.Prices, h.DateNumeric, 'o')
plt.plot(h.Prices, h.DateNumeric)
plt.show()

X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)

w = tf.Variable(np.random.randn(), name = 'weight')
b = tf.Variable(np.random.randn(), name ='bias')
print(b.value())

prediction = tf.add(tf.multiply(X, w), b)
cost = tf.reduce_sum((prediction - Y)**2 / (2* n_sample))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    # number of procedure will be decidedd by epoch
    for epoch in range(epochs):
        for x, y in zip(h.DateNumeric, h.Prices):
        #for x, y in zip(h.Prices, h.DateNumeric):
            sess.run(optimizer, feed_dict={X:x, Y:y})
            
            if(epoch%20) == 0:
                c = sess.run(cost, feed_dict={X: h.DateNumeric, Y:h.Prices})
                W = sess.run(w)
                B = sess.run(b)
                #print("cost, w, b",cost," " ,w," ",b)
                print("cost:{} w:{} b:{}".format(c, W, B))
    weight = sess.run(w)
    bias = sess.run(b)
    #plt.plot(x_train, y_train, 'o')
    plt.plot(h.DateNumeric, h.Prices, 'o')
    plt.plot(h.DateNumeric, weight * h.DateNumeric + bias)
    #plt.plot(x_train, weight * x_train + bias)
    plt.show()

我无法正确预测成本、权重和偏差的值。显示的值是:

cost:40475074560.0 w:336.76763916015625 b:0.42306655645370483
cost:7042602293526528.0 w:-140444.890625 b:-125.27484130859375
cost:1.510587957600892e+21 w:65044804.0 b:55116.4609375
cost:3.583142054672409e+26 w:-31679014912.0 b:-26179644.0
cost:9.375889959585844e+31 w:16204882247680.0 b:13067821056.0
cost:2.7000283394741965e+37 w:-8696085782462464.0 b:-6847003623424.0
cost:inf w:4.710892753578361e+18 b:3691890536873984.0
cost:inf w:-2.5640481681247033e+21 b:-2.0047201720315412e+18
cost:inf w:1.3977489977961294e+24 b:1.0919898423421053e+21
cost:inf w:-7.631533130976661e+26 b:-5.95747219549254e+23
cost:inf w:4.1797650628306485e+29 b:3.257796829444394e+26
cost:inf w:-2.296399276256281e+32 b:-1.7870760811687127e+29
cost:inf w:1.2655991850303288e+35 b:9.833687923238823e+31
cost:inf w:-inf b:-5.432245785997481e+34
cost:nan w:nan b:nan
cost:nan w:nan b:nan
cost:nan w:nan b:nan
cost:nan w:nan b:nan
cost:nan w:nan b:nan
cost:nan w:nan b:nan
cost:nan w:nan b:nan
cost:nan w:nan b:nan
cost:nan w:nan b:nan
cost:nan w:nan b:nan

0 个答案:

没有答案