通常,当对某些时间序列数据进行拟合(例如,多项式拟合)时,函数将返回与每个拟合点相关的误差。我现在正在尝试使用scikit-learn的支持向量回归(SVR)拟合,它没有任何这样的回报。 scikit-learn中有一个方便的函数叫validation_score
,可以告诉我各种拟合的准确度分数,从中我选择最好的一个。这并不理想,因为它不允许我通过任何后续的数据操作继续传播错误。
如何通过python / scikit-learn / SVR fit传播我的时间序列数据的错误?
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import validation_curve, ShuffleSplit
from sklearn.svm import SVR
# Create some fake data
times = np.linspace(0, 10, 10)[:, None]
data = np.sin(times).ravel()
uncertainty = np.linspace(0.05, 0.1, num=10)
np.random.shuffle(uncertainty)
sample_weight = 1 / uncertainty
# Quick helper function
def jpm_svr(gamma=1e-6, **kwargs):
return make_pipeline(SVR(kernel='rbf', C=1e3, gamma=gamma, **kwargs))
# Find the best value of gamma
gamma = np.logspace(-2, 5, num=11, base=10)
shuffle_split = ShuffleSplit(n_splits=20, train_size=0.5, test_size=0.5, random_state=None)
# In the next line I also want to input the sample_weight but can't
train_score, val_score = validation_curve(jpm_svr(), X, y, #sample_weight :(
'svr__gamma',
gamma, cv=shuffle_split, scoring=evs)
score = np.median(val_score, axis=1)
best_score_index = np.argmax(score)
# Generate model with best value of gamma
# Also note that I can now pass in sample_weight
# But there aren't any direct returns from SVR for uncertainty
model = SVR(kernel='rbf', C=1e3, gamma=gamma[best_score_index]).fit(times, data, sample_weight)
X_test = np.linspace(0, 10, 50)[:, None]
y_test = model.predict(X_test)