将变压器添加到sklearn管道中以进行交叉验证

时间:2020-09-08 06:47:54

标签: python machine-learning scikit-learn

我想在我的sklearn管道中添加目标变量转换器。通常,对于像PCA之类的操作或任何类型的回归分类器,sklearn支持CV的参数网格,例如:

        param_grid = [{
            "pca__n_components": [5, 10, 25, 50, 125, 250, 625, 1500, 3000],
            "rdf__n_estimators": n_estimators,
            "rdf__bootstrap": bootstrap,
            "rdf__max_depth": max_depth,
            "rdf__class_weight": class_weight}]

是否也可以将可变变压器添加到此网格?例如,我想先训练我的回归变量而不转换目标变量,然后再使用PowerTransformer(),我想缩放目标变量,并查看它是否可以改善我的结果。也可以将它们集成到参数网格中吗?

1 个答案:

答案 0 :(得分:2)

是的,可以将不同的转换器集成到您的param_grid词典中:

from sklearn.datasets import make_classification
from sklearn.preprocessing import PowerTransformer
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=0)
pipe = Pipeline([('transformer', PowerTransformer()), ('svc', SVC())])

param_grid  = {"svc__C":[1, 10], "transformer":[PowerTransformer(), StandardScaler()]}

clf = GridSearchCV(pipe, param_grid )
clf.fit(X_train, y_train)

print(clf.best_params_)