管道+ standardscaler + OHE + CLF + GridSearchCV + ColumnTranformer

时间:2020-08-17 15:27:37

标签: scikit-learn pipeline pca one-hot-encoding gridsearchcv

我正在为自己的lil项目尝试使用管道+ standardscaler + OHE + CLF + GridSearchCV + ColumnTranformer进行一些数据建模。

我期望我的代码可以正常运行,除非不能正常运行。

    Fitting 10 folds for each of 36 candidates, totalling 360 fits
    [CV] clf__C=0.0001, clf__kernel=rbf, reduce_dims__n_components=4 .....
    [Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-50-c52ff4770002> in <module>()
          1 grid = GridSearchCV(clf, param_grid=param_grid, cv=10, n_jobs=1, verbose=2, scoring= 'accuracy')
    ----> 2 grid.fit(X, y)
          3 print(grid.best_score_)
          4 print(grid.cv_results_)
    
    13 frames
    /usr/local/lib/python3.6/dist-packages/sklearn/base.py in set_params(self, **params)
        234                                  'Check the list of available parameters '
        235                                  'with `estimator.get_params().keys()`.' %
    --> 236                                  (key, self))
        237 
        238             if delim:
    
    ValueError: Invalid parameter clf for estimator Pipeline(memory=None,
             steps=[('preprocessor',
                     ColumnTransformer(n_jobs=None, remainder='drop',
                                       sparse_threshold=0.3,
                                       transformer_weights=None,
                                       transformers=[('num',
                                                      Pipeline(memory=None,
                                                               steps=[('scale',
                                                                       StandardScaler(copy=True,
                                                                                      with_mean=True,
                                                                                      with_std=True)),
                                                                      ('reduce_dims',
                                                                       PCA(copy=True,
                                                                           iterated_power='auto',
                                                                           n_components=4,
                                                                           random_state=None,
                                                                           svd_solver='aut...
                                                      Pipeline(memory=None,
                                                               steps=[('onehot',
                                                                       OneHotEncoder(categories='auto',
                                                                                     drop=None,
                                                                                     dtype=<class 'numpy.float64'>,
                                                                                     handle_unknown='ignore',
                                                                                     sparse=True))],
                                                               verbose=False),
                                                      ['Method', 'Regionname',
                                                       'Type'])],
                                       verbose=False)),
                    ('SVR',
                     SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
                         gamma='scale', kernel='rbf', max_iter=-1, shrinking=True,
                         tol=0.001, verbose=False))],
             verbose=False). Check the list of available parameters with `estimator.get_params().keys()`.

我已经尝试了sklearn网站上的用户指南,但是无论我多么努力,它仍然会弹出与上面显示的相同的错误。

X = Melbourne_housing[['Bathroom', 'Method', 'Regionname', 'Rooms', 'Type']]
y = Melbourne_housing[['Price']]
from sklearn.decomposition import PCA
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVR
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
numeric_features = ['Bathroom','Rooms','Price']
numeric_transformer  = Pipeline(steps=[
        ('scale', StandardScaler()),
        ('reduce_dims', PCA(n_components=4))     
])
categorical_features = ['Method', 'Regionname','Type']
categorical_transformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])
param_grid = dict(reduce_dims__n_components=[4,6,8],
                  clf__C=np.logspace(-4, 1, 6),
                  clf__kernel=['rbf','linear'])
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])
# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('SVR', SVR())])

grid = GridSearchCV(clf, param_grid=param_grid, cv=10, n_jobs=1, verbose=2, scoring= 'accuracy')
grid.fit(X, y)
print(grid.best_score_)
print(grid.cv_results_)

我个人是python和机器学习的新手(仅几个月的经验),所以我真的需要您的帮助。

0 个答案:

没有答案