在Python线性回归中使用日期类型

时间:2018-01-30 09:53:00

标签: python-3.x linear-regression

数据集:

我收集了数据库的表空间增长并尝试使用它 预测增长。

数据集拥有2009年至2017年的数据。我尝试了很多方法,但无法使用日期格式进行处理。得到错误,所有错误都与日期时间类型有关。您能否建议我如何使用此数据集来预测增长。

其中一个错误:

  

TypeError: Cannot cast array data from dtype('M8[ns]') to dtype('float64') according to the rule 'safe'

TS_SIZE FETCH_DATE
34911.99    01-05-2009
34672.5     02-05-2009
34683.39    03-05-2009
34904.7     04-05-2009
35063.87    05-05-2009
35298.46    06-05-2009
35161.88    07-05-2009
34872.53    08-05-2009

代码

%matplotlib notebook
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
import pandas as pd
import numpy as np
from sklearn.cross_validation import train_test_split
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
import datetime

data = pd.read_excel('D:/database/2.xlsx')
X_R1 = data['FETCH_DATE'].to_frame() #DataFrame
X_R1 = np.array(X_R1).reshape((-1,1))
y_R1 = data['TS_SIZE']

X_train, X_test, y_train, y_test = train_test_split(X_R1, y_R1, test_size=0.3,random_state = 0)
linreg = LinearRegression().fit(X_train, y_train)
ytest_predict_linear = linreg.predict(X_test)

###########POLY TEST PREDICTION#################
#lr = LinearRegression()
pr = LinearRegression()
poly = PolynomialFeatures(degree = 2)
X_R1_Poly = poly.fit_transform(X_R1)

pr.fit(X_R1_Poly,y_R1)
#X_train, X_test, y_train, y_test =    train_test_split(X_R1_Poly,y_R1,random_state=0)
ytest_predict_quadratic = pr.predict(poly.fit_transform(X_test))
#linreg = Ridge().fit(X_train,y_train)
#print("Predicted Quadratic: {}" .format(ytest_predict_quadratic))
#plt.figure(figsize=(5,4))
#plt.scatter(X_R1,y_R1,marker= 'o', s=50, alpha=0.8,label='training points')
plt.scatter(X_R1,y_R1,marker= 'o',label='training points')
#plt.plot(X_R1, linreg.coef_ * X_R1_Poly + linreg.intercept_, 'r-')
plt.plot(X_test,ytest_predict_linear,label='linear fit',linestyle='--',color='r')
plt.plot(X_test,ytest_predict_quadratic,label='quadratic fit',color='g')
#plt.xlabel('Feature value x')
#plt.ylabel('Feature value y')
plt.legend(loc='upper left')
#plt.show()

#print('Training MSE linear: %.3f, quadratic: %.3f' % (mean_squared_error(y_R1, ytest_predict_linear),mean_squared_error(y_R1,    ytest_predict_quadratic)))
#print('Training R^2 linear: %.3f, quadratic: %.3f' % (r2_score(y_R1, ytest_predict_linear),r2_score(y_R1, ytest_predict_quadratic)))
###########POLY NEW PREDICTIONS#################

data1 = pd.read_excel('D:/database/2.xlsx')
print('Printing new dates')
print(data1['FETCH_DATE'])
X_R2_quad = pd.DataFrame(data1['FETCH_DATE'])
X_R2_quad = np.array(X_R2_quad,dtype='int64')
print(X_R2_quad)
#print("New values shape: %s" %(X_R2_quad.shape))
#print("New values: %s" %(X_R2_quad))
X_R2_quad_poly = poly.fit_transform(X_R2_quad)
#X_R2_quad_poly = linreg.fit(X_R2_quad)

ynew_predict_quadratic = pr.predict(X_R2_quad_poly)
#ynew_predict_quadratic = linreg.predict(X_R2_quad_poly)

print("Predicted Values beyond test: {}" .format(ynew_predict_quadratic))

plt.scatter(X_R2_quad, ynew_predict_quadratic,marker= '*',label='predicted values')
plt.plot(X_R2_quad,ynew_predict_quadratic,label='predicted quadratic fit')
plt.legend(loc='upper left')
plt.xlabel("Year")
plt.ylabel("Predicted Growth")
#x=plt.gca().xaxis
#for item in x.get_ticklabels():
#   item.set_rotation(45)
plt.show()

0 个答案:

没有答案