让我们考虑一个多元回归问题(2个响应变量:纬度和经度)。目前,像支持向量回归sklearn.svm.SVR
这样的一些机器学习模型实现目前不提供对多元回归的天真支持。因此,可以使用sklearn.multioutput.MultiOutputRegressor
。
示例:
from sklearn.multioutput import MultiOutputRegressor
svr_multi = MultiOutputRegressor(SVR(),n_jobs=-1)
#Fit the algorithm on the data
svr_multi.fit(X_train, y_train)
y_pred= svr_multi.predict(X_test)
我的目标是通过SVR
调整sklearn.model_selection.GridSearchCV
的参数。理想情况下,如果响应是单个变量而不是多个,我将按如下方式执行操作:
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
pipe_svr = (Pipeline([('scl', StandardScaler()),
('reg', SVR())]))
grid_param_svr = {
'reg__C': [0.01,0.1,1,10],
'reg__epsilon': [0.1,0.2,0.3],
'degree': [2,3,4]
}
gs_svr = (GridSearchCV(estimator=pipe_svr,
param_grid=grid_param_svr,
cv=10,
scoring = 'neg_mean_squared_error',
n_jobs = -1))
gs_svr = gs_svr.fit(X_train,y_train)
但是,由于我的回复y_train
是二维的,我需要在SVR之上使用MultiOutputRegressor
。如何修改上述代码以启用此 GridSearchCV 操作?如果不可能,还有更好的选择吗?
答案 0 :(得分:8)
我刚刚找到了一个有效的解决方案。在嵌套估计器的情况下,内部估计器的参数可以由estimator__
访问。
from sklearn.multioutput import MultiOutputRegressor
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
pipe_svr = Pipeline([('scl', StandardScaler()),
('reg', MultiOutputRegressor(SVR()))])
grid_param_svr = {
'reg__estimator__C': [0.1,1,10]
}
gs_svr = (GridSearchCV(estimator=pipe_svr,
param_grid=grid_param_svr,
cv=2,
scoring = 'neg_mean_squared_error',
n_jobs = -1))
gs_svr = gs_svr.fit(X_train,y_train)
gs_svr.best_estimator_
Pipeline(steps=[('scl', StandardScaler(copy=True, with_mean=True, with_std=True)),
('reg', MultiOutputRegressor(estimator=SVR(C=10, cache_size=200,
coef0=0.0, degree=3, epsilon=0.1, gamma='auto', kernel='rbf', max_iter=-1,
shrinking=True, tol=0.001, verbose=False), n_jobs=1))])
答案 1 :(得分:5)
要在不使用管道的情况下使用,请将estimator__
放在参数之前:
param_grid = {'estimator__min_samples_split':[10, 50],
'estimator__min_samples_leaf':[50, 150]}
gb = GradientBoostingRegressor()
gs = GridSearchCV(MultiOutputRegressor(gb), param_grid=param_grid)
gs.fit(X,y)
答案 2 :(得分:3)
谢谢你,Marco。
添加到 your answer 此处是一个简短的说明性示例,说明了应用于多输出 GradientBoostingRegressor 的随机搜索。
from sklearn.datasets import load_linnerud
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn.model_selection import RandomizedSearchCV
x, y = load_linnerud(return_X_y=True)
model = MultiOutputRegressor(GradientBoostingRegressor(loss='ls', learning_rate=0.1, n_estimators=100, subsample=1.0,
criterion='friedman_mse', min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0, max_depth=3,
min_impurity_decrease=0.0,
min_impurity_split=None, init=None, random_state=None,
max_features=None,
alpha=0.9, verbose=0, max_leaf_nodes=None, warm_start=False,
validation_fraction=0.1, n_iter_no_change=None, tol=0.0001,
ccp_alpha=0.0))
hyperparameters = dict(estimator__learning_rate=[0.05, 0.1, 0.2, 0.5, 0.9], estimator__loss=['ls', 'lad', 'huber'],
estimator__n_estimators=[20, 50, 100, 200, 300, 500, 700, 1000],
estimator__criterion=['friedman_mse', 'mse'], estimator__min_samples_split=[2, 4, 7, 10],
estimator__max_depth=[3, 5, 10, 15, 20, 30], estimator__min_samples_leaf=[1, 2, 3, 5, 8, 10],
estimator__min_impurity_decrease=[0, 0.2, 0.4, 0.6, 0.8],
estimator__max_leaf_nodes=[5, 10, 20, 30, 50, 100, 300])
randomized_search = RandomizedSearchCV(model, hyperparameters, random_state=0, n_iter=5, scoring=None,
n_jobs=2, refit=True, cv=5, verbose=True,
pre_dispatch='2*n_jobs', error_score='raise', return_train_score=True)
hyperparameters_tuning = randomized_search.fit(x, y)
print('Best Parameters = {}'.format(hyperparameters_tuning.best_params_))
tuned_model = hyperparameters_tuning.best_estimator_
print(tuned_model.predict(x))