我正在尝试使用scikit-learn的GridSearchCV
函数来查找某些基本模型的最佳参数,然后将其输入到堆叠估算器中。
我的代码基于这篇文章(我用来说明):https://stats.stackexchange.com/questions/139042/ensemble-of-different-kinds-of-regressors-using-scikit-learn-or-any-other-pytho/274147
我想对我的估算器的参数进行网格搜索(主要是脊参数,KNN中邻居的数量,以及RF深度和溢出),但我无法使其工作。我在下面定义了模型:
from sklearn.base import TransformerMixin
from sklearn.datasets import make_regression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import LinearRegression, Ridge
class RidgeTransformer(Ridge, TransformerMixin):
def transform(self, X, *_):
return self.predict(X)
class RandomForestTransformer(RandomForestRegressor, TransformerMixin):
def transform(self, X, *_):
return self.predict(X)
class KNeighborsTransformer(KNeighborsRegressor, TransformerMixin):
def transform(self, X, *_):
return self.predict(X)
def build_model():
ridge_transformer = Pipeline(steps=[
('scaler', StandardScaler()),
('poly_feats', PolynomialFeatures()),
('ridge', RidgeTransformer())
])
pred_union = FeatureUnion(
transformer_list=[
('ridge', ridge_transformer),
('rand_forest', RandomForestTransformer()),
('knn', KNeighborsTransformer())
],
n_jobs=2
)
model = Pipeline(steps=[
('pred_union', pred_union),
('lin_regr', LinearRegression())
])
return model
现在,我想在林的参数上运行CV。我可以通过以下方式获取参数:
print(model.get_params().keys())
但是当我运行下面的代码时,我仍然会收到错误:
pipe = Pipeline(steps=[('reg', model)])
parameters = {'pred_union__rand_forest__n_estimators':[20, 50, 100, 200]}
g_search = GridSearchCV(pipe, parameters)
X, y = make_regression(n_features=10, n_targets=2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
g_search.fit(X_train, y_train)
Invalid parameter pred_union for estimator Pipeline(memory=None,
steps=[('reg', Pipeline(memory=None,
steps=[('pred_union', FeatureUnion(n_jobs=2,
transformer_list=[('ridge', Pipeline(memory=None,
steps=[('scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('poly_feats', PolynomialFeatures(degree=2, include_bias=True, interaction_only=False)), ('ridge', RidgeTransformer(...=None)), ('lin_regr', LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False))]))]). Check the list of available parameters with `estimator.get_params().keys()`.
我做错了什么?
答案 0 :(得分:1)
您的model
实际上已经是一个管道,那么为什么要在管道中再次包装它?无需pipe = Pipeline(steps=[('reg', model)])
。只需在网格搜索中使用model
。
但是如果你想将它包装在一个管道中然后工作,那么你需要通过将'reg'
附加到每个名称来更新参数。
parameters = {'reg__pred_union__rand_forest__n_estimators':[20, 50, 100, 200]}