我有一些代码将使用TimeSeriesSplit
来拆分数据。对于每个拆分,我将使用ParametersGrid
并遍历每个参数组合,记录最佳参数集,并使用它来预测我的X_test
。您可以在文章底部看到该部分的代码
我知道GridSearchCV
将为我完成很多工作。我想知道我是否使用以下代码,我的数据在哪里分解成
X_train
,X_test
,y_train
和y_test
?将GridSearchCV
与TimeSeriesSplit
一起使用会在后台处理此问题吗?如果是,那么此代码是否会完成与本帖子底部的原始代码相同的事情?另外,我现在已经尝试了GridSearchCV
方法,而且已经快30分钟了,我还没有完成-我的语法正确吗?
X = data.iloc[:, 0:8]
y = data.iloc[:, 8:9]
parameters = [
{'kernel': ['rbf'],
'gamma': [.01],
'C': [1, 10, 100]}]
gsc = GridSearchCV(SVR(), param_grid=parameters, scoring='neg_mean_absolute_error',
cv=TimeSeriesSplit(n_splits=2))
gsc.fit(X,y)
means = gsc.cv_results_['mean_test_score']
for mean in means:
print(mean)
print('end')
以下原始代码:
# Create the time series split generator
tscv = TimeSeriesSplit(n_splits=3)
for train_index, test_index in tqdm(tscv.split(X)):
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]
# scale the data set
scaler_X = StandardScaler()
scaler_y = StandardScaler()
scaler_X.fit(X_train)
scaler_y.fit(y_train)
X_train, X_test = scaler_X.transform(X_train), scaler_X.transform(X_test)
y_train, y_test = scaler_y.transform(y_train), scaler_y.transform(y_test)
# optimization area - set params
parameters = [
{'kernel': ['rbf'],
'gamma': [.01],
'C': [ 1,10,100,500,1000]}]
regressor = SVR()
# loop through each of the parameters and find the best set
for e, g in enumerate(ParameterGrid(parameters)):
regressor.set_params(**g)
regressor.fit(X_train, y_train.ravel())
score = metrics.mean_absolute_error(regressor.predict(X_train), y_train.ravel())
if e == 0:
best_score = score
best_params = g
elif score < best_score:
best_score = score
best_params = g
# refit the model with the best set of params
regressor.set_params(**best_params)
regressor.fit(X_train, y_train.ravel())
答案 0 :(得分:1)
您需要稍微修改代码。
gsc = GridSearchCV(SVR(), param_grid=parameters, scoring='neg_mean_absolute_error',
cv=TimeSeriesSplit(n_splits=2).split(X))
而且,您可以考虑添加verbose
参数来查看运行输出。