Question

通常，当对某些时间序列数据进行拟合（例如，多项式拟合）时，函数将返回与每个拟合点相关的误差。我现在正在尝试使用scikit-learn的支持向量回归（SVR）拟合，它没有任何这样的回报。 scikit-learn中有一个方便的函数叫validation_score，可以告诉我各种拟合的准确度分数，从中我选择最好的一个。这并不理想，因为它不允许我通过任何后续的数据操作继续传播错误。

如何通过python / scikit-learn / SVR fit传播我的时间序列数据的错误？

import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import validation_curve, ShuffleSplit
from sklearn.svm import SVR

# Create some fake data
times = np.linspace(0, 10, 10)[:, None]
data = np.sin(times).ravel()
uncertainty = np.linspace(0.05, 0.1, num=10)
np.random.shuffle(uncertainty)
sample_weight = 1 / uncertainty


# Quick helper function
def jpm_svr(gamma=1e-6, **kwargs):
    return make_pipeline(SVR(kernel='rbf', C=1e3, gamma=gamma, **kwargs))

# Find the best value of gamma
gamma = np.logspace(-2, 5, num=11, base=10)
shuffle_split = ShuffleSplit(n_splits=20, train_size=0.5, test_size=0.5, random_state=None)
# In the next line I also want to input the sample_weight but can't
train_score, val_score = validation_curve(jpm_svr(), X, y, #sample_weight :(
                                          'svr__gamma',
                                          gamma, cv=shuffle_split, scoring=evs)
score = np.median(val_score, axis=1)
best_score_index = np.argmax(score)

# Generate model with best value of gamma
# Also note that I can now pass in sample_weight
# But there aren't any direct returns from SVR for uncertainty
model = SVR(kernel='rbf', C=1e3, gamma=gamma[best_score_index]).fit(times, data, sample_weight)
X_test = np.linspace(0, 10, 50)[:, None]
y_test = model.predict(X_test)

如何通过python / scikit-learn / SVR fit传播我的时间序列数据的错误？

0 个答案: