使用管道和GridSearchCV的多维降维技术

时间:2020-06-05 12:48:03

标签: machine-learning scikit-learn grid-search hyperparameters dimensionality-reduction

我们都知道用降维技术定义管道的通用方法,然后是用于训练和测试的模型。然后,我们可以将GridSearchCv应用于超参数调整。

grid = GridSearchCV(
Pipeline([
    ('reduce_dim', PCA()),
    ('classify', RandomForestClassifier(n_jobs = -1))
    ]),
param_grid=[
    {
        'reduce_dim__n_components': range(0.7,0.9,0.1),
        'classify__n_estimators': range(10,50,5),
        'classify__max_features': ['auto', 0.2],
        'classify__min_samples_leaf': [40,50,60],
        'classify__criterion': ['gini', 'entropy']
    }
],
cv=5, scoring='f1')
grid.fit(X,y)

我能理解上面的代码。

现在我今天正在经历documentation,在那里我发现了一个有点奇怪的零件代码。

pipe = Pipeline([
    # the reduce_dim stage is populated by the param_grid
    ('reduce_dim', 'passthrough'),                        # How does this work??
    ('classify', LinearSVC(dual=False, max_iter=10000))
])

N_FEATURES_OPTIONS = [2, 4, 8]
C_OPTIONS = [1, 10, 100, 1000]
param_grid = [
    {
        'reduce_dim': [PCA(iterated_power=7), NMF()],
        'reduce_dim__n_components': N_FEATURES_OPTIONS,   ### No PCA is used..??
        'classify__C': C_OPTIONS
    },
    {
        'reduce_dim': [SelectKBest(chi2)],
        'reduce_dim__k': N_FEATURES_OPTIONS,
        'classify__C': C_OPTIONS
    },
]
reducer_labels = ['PCA', 'NMF', 'KBest(chi2)']

grid = GridSearchCV(pipe, n_jobs=1, param_grid=param_grid)
X, y = load_digits(return_X_y=True)
grid.fit(X, y)
  1. 首先,在定义管道时,它使用字符串“ passthrough”而不是对象。

        ('reduce_dim', 'passthrough'),  ```
    
  2. 然后,在为网格搜索定义不同的降维技术时,它使用了不同的策略。 [PCA(iterated_power=7), NMF()]如何工作?
            'reduce_dim': [PCA(iterated_power=7), NMF()],
            'reduce_dim__n_components': N_FEATURES_OPTIONS,  # here 
    

请有人向我解释代码。

已解决-在一行中,顺序为['PCA', 'NMF', 'KBest(chi2)']

提供- seralouk (请参见下面的答案)

参考 如果有人需要更多详细信息 1 2 3

1 个答案:

答案 0 :(得分:1)

据我所知等效。


在文档中您具有以下内容:

pipe = Pipeline([
    # the reduce_dim stage is populated by the param_grid
    ('reduce_dim', 'passthrough'),
    ('classify', LinearSVC(dual=False, max_iter=10000))
])

N_FEATURES_OPTIONS = [2, 4, 8]
C_OPTIONS = [1, 10, 100, 1000]
param_grid = [
    {
        'reduce_dim': [PCA(iterated_power=7), NMF()],
        'reduce_dim__n_components': N_FEATURES_OPTIONS,
        'classify__C': C_OPTIONS
    },
    {
        'reduce_dim': [SelectKBest(chi2)],
        'reduce_dim__k': N_FEATURES_OPTIONS,
        'classify__C': C_OPTIONS
    },
]

最初我们有('reduce_dim', 'passthrough'),,然后是'reduce_dim': [PCA(iterated_power=7), NMF()]

PCA的定义在第二行完成。


您可以选择以下定义:

pipe = Pipeline([
    # the reduce_dim stage is populated by the param_grid
    ('reduce_dim', PCA(iterated_power=7)),
    ('classify', LinearSVC(dual=False, max_iter=10000))
])

N_FEATURES_OPTIONS = [2, 4, 8]
C_OPTIONS = [1, 10, 100, 1000]
param_grid = [
    {
        'reduce_dim__n_components': N_FEATURES_OPTIONS,
        'classify__C': C_OPTIONS
    },
    {
        'reduce_dim': [SelectKBest(chi2)],
        'reduce_dim__k': N_FEATURES_OPTIONS,
        'classify__C': C_OPTIONS
    },
]