Question

我正在尝试使用sklearn将多项式曲线拟合到我的数据中，并使用该曲线预测未来数据。然而，我最终得到的是一条水平线，显然根本不适合。我尝试了scipy优化并获得了一个非常合适的曲线，但我希望从sklearn进行预测。

这是我的代码：

import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from sklearn import preprocessing
from sklearn.svm import SVR


def bytespdate2num(fmt, encoding='utf-8'):#function to convert bytes to string for the dates.
    strconverter = mdates.strpdate2num(fmt)
    def bytesconverter(b):
        s = b.decode(encoding)
        return strconverter(s)
    return bytesconverter

dataCSV = open('combined_data_sr2.csv')

comb_data = []

for line in dataCSV:
    if 'Date' not in line:
        comb_data.append(line)

date, data1, data2 = np.loadtxt(comb_data, delimiter=',', unpack=True, converters={0: bytespdate2num('%d/%m/%Y')})

pred_x = list(zip(data1))



Xpre = np.reshape(pred_x, (len(pred_x), 1))#to ensure that the data has the correct dimensionality.
x = preprocessing.scale(Xpre)#scaling the data
y = data2

b = 3
x_train = x[b:]
x_test = x[:b]

y_train = y[b:]
y_test = y[:b]

regr = SVR(kernel='poly', degree=3)
g = regr.fit(x, y).predict(x)#just for fitting the data and seeing if the estimator is right


print("Residual sum of squares: %.2f"
      % np.mean((regr.predict(x_test) - y_test) ** 2))

print('Variance score: %.2f' % regr.score(x_test, y_test))


print ('Prediction input:',(x_test))
print ('Prediction:',(regr.predict(x_test)))
print ('Actual:',(y_test))


plt.scatter(x, y)
plt.hold('on')
plt.plot(x, g, color='green', label='LR')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

以下是该代码的输出：

Residual sum of squares: 98171635548288.23
Variance score: -900.75
Prediction input: [[ 1.85313098]
 [ 1.98736261]
 [ 2.11307418]]
Prediction: [ 15188181.21713208  15188243.6121402   15188310.2184889 ]
Actual: [ 25417186.  25216661.  24638877.]

Here是拟合出来的情节。

Here是我想要看到的，使用scipy的优化曲线拟合。

如何让sklearn以与scipy.optimize的curve_fit相同的方式拟合我的数据？我希望能够预测非线性数据。

sklearn SVR poly将无法正确拟合数据

0 个答案: