我正在尝试使用sklearn将多项式曲线拟合到我的数据中,并使用该曲线预测未来数据。然而,我最终得到的是一条水平线,显然根本不适合。我尝试了scipy优化并获得了一个非常合适的曲线,但我希望从sklearn进行预测。
这是我的代码:
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from sklearn import preprocessing
from sklearn.svm import SVR
def bytespdate2num(fmt, encoding='utf-8'):#function to convert bytes to string for the dates.
strconverter = mdates.strpdate2num(fmt)
def bytesconverter(b):
s = b.decode(encoding)
return strconverter(s)
return bytesconverter
dataCSV = open('combined_data_sr2.csv')
comb_data = []
for line in dataCSV:
if 'Date' not in line:
comb_data.append(line)
date, data1, data2 = np.loadtxt(comb_data, delimiter=',', unpack=True, converters={0: bytespdate2num('%d/%m/%Y')})
pred_x = list(zip(data1))
Xpre = np.reshape(pred_x, (len(pred_x), 1))#to ensure that the data has the correct dimensionality.
x = preprocessing.scale(Xpre)#scaling the data
y = data2
b = 3
x_train = x[b:]
x_test = x[:b]
y_train = y[b:]
y_test = y[:b]
regr = SVR(kernel='poly', degree=3)
g = regr.fit(x, y).predict(x)#just for fitting the data and seeing if the estimator is right
print("Residual sum of squares: %.2f"
% np.mean((regr.predict(x_test) - y_test) ** 2))
print('Variance score: %.2f' % regr.score(x_test, y_test))
print ('Prediction input:',(x_test))
print ('Prediction:',(regr.predict(x_test)))
print ('Actual:',(y_test))
plt.scatter(x, y)
plt.hold('on')
plt.plot(x, g, color='green', label='LR')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
以下是该代码的输出:
Residual sum of squares: 98171635548288.23
Variance score: -900.75
Prediction input: [[ 1.85313098]
[ 1.98736261]
[ 2.11307418]]
Prediction: [ 15188181.21713208 15188243.6121402 15188310.2184889 ]
Actual: [ 25417186. 25216661. 24638877.]
Here是拟合出来的情节。
Here是我想要看到的,使用scipy的优化曲线拟合。
如何让sklearn以与scipy.optimize的curve_fit相同的方式拟合我的数据?我希望能够预测非线性数据。