带有柱变压器和管道的ML模型的参数调整

时间:2020-11-08 02:54:43

标签: python scikit-learn pipeline one-hot-encoding gridsearchcv

我的代码可以很好地工作,直到适合最终模型为止。但是我不知道如何为管道做GridSearchCV或RandomizedSearchCV。请帮助我。

import pandas as pd
import numpy as np
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import make_pipeline


df = pd.read_csv('data/vehicle_dataset_v4A.csv')

X = df.drop('price', axis=1)
y = df['price']

numerical_ix = X.select_dtypes(include=['int64', 'float64']).columns
categorical_ix = X.select_dtypes(include=['object', 'bool']).columns

col_transform = make_column_transformer(
    (OneHotEncoder(), categorical_ix), 
    (StandardScaler(), numerical_ix),
    remainder='passthrough'
)

model = RandomForestRegressor()

pipe = make_pipeline(col_transform,model)

pipe.fit(X, y)

我尝试了以下代码。该代码运行时没有任何错误,但是当我尝试使用Gridsearchcv进行预测时,它会在不同的时间抛出不同的错误。希望对此有解决方案。否则,如果在进行网格搜索后可以知道什么是最佳参数,则可以将这些参数直接应用于模型。

lr = {
    'base_score':[0.4,0.45,0.5,0.55,0.6],
    'max_depth':[1,2,3,4,6,8,10],
    'subsample':[0.5,0.6,0.7,0.8,0.9,1],
    'n_estimators': [50,100,200,250,300],
    'learning_rate':  [0.05,0.1,0.4,0.5,0.8,0.9,1],
    'min_child_weight': [0.1,0.5,1,1.5,2,3],
    'gamma': [0,0.1,0.5,1,1.5,2,2.5,3]
    }

clf = make_pipeline(OneHotEncoder(),
                    StandardScaler(with_mean=False),
                    GridSearchCV(RandomForestRegressor(),
                                 param_grid=lr,
                                 scoring='r2',cv=3,verbose=2))

1 个答案:

答案 0 :(得分:0)

关于您的应用程序的三个想法:

  1. 请勿将jndi.properties用于OneHotEncoder,您不需要它。
  2. 请勿使用RandomForestRegressor,这对您的问题来说是过大了。
  3. 首先在数据上应用make_pipeline,然后运行StandardScaler

请对此进行测试,并向我们提供反馈。