Question

我有一个时间序列数据集，其中包含每个点的不确定性。我想使用sklearn.model_selection.validation_curve()来计算sklearn.svm.svr()（支持向量机回归）的超参数γ的最佳值。很容易将不确定性（实际上是它们的反向，sample_weights）传递给svr（）。fit（）函数，这很有用。尽管如此，validation_curve（）的输入似乎都不允许这样做。有解决方法吗？

# Create some fake data
times = np.linspace(0, 10, 10)[:, None]
data = np.sin(times).ravel()
uncertainty = np.linspace(0.05, 0.1, num=10)
np.random.shuffle(uncertainty)
sample_weight = 1 / uncertainty


# Quick helper function
def jpm_svr(gamma=1e-6, **kwargs):
    return make_pipeline(SVR(kernel='rbf', C=1e3, gamma=gamma, **kwargs))

# Find the best value of gamma
gamma = np.logspace(-2, 5, num=11, base=10)
shuffle_split = ShuffleSplit(n_splits=20, train_size=0.5, test_size=0.5, random_state=None)
######## The next line is where I want to input the sample_weight but can't
train_score, val_score = validation_curve(jpm_svr(), X, y, #sample_weight :(
                                          'svr__gamma',
                                          gamma, cv=shuffle_split, scoring=evs)
score = np.median(val_score, axis=1)
best_score_index = np.argmax(score)

# Generate model with best value of gamma and plot it over the data
# Also note that I can now pass in sample_weight
model = SVR(kernel='rbf', C=1e3, gamma=gamma[best_score_index]).fit(times, data, sample_weight)
X_test = np.linspace(0, 10, 50)[:, None]
y_test = model.predict(X_test)
plt.errorbar(times, data, yerr=uncertainty, fmt='o');
plt.plot(X_test.ravel(), y_test, linewidth=6);

有任何方法可以将不确定性/权重折叠成scikit-learn validation_curve吗？

0 个答案: