正确使用sklearn管

时间:2019-06-11 13:17:53

标签: python scikit-learn

我有一些NN作业项目,想尝试一些带有gridsearch和管道的架构。但是看来我不太了解pipe和gridsearch在内部如何工作

我尝试了这样的代码,因为它看起来像gridsearch在不带参数的情况下在管道的定义中执行MyModel(在make_NN中失败)

if dt_rf != None and layers != None:

但是我实际上认为参数为MyModel的 init 提供了参数,是否有问题,我该如何解决?

class MyModel(BaseEstimator):
    def __init__(self, dt_rf=None, loss=None, optimizer=None, layers=None):
        if dt_rf != None and layers != None:
            self.dt_rf = dt_rf
            self.nn = make_NN(loss, optimizer, layers)
    def fit(self, X, y):
        self.dt_rf.fit(X[CATS], y)
        self.nn.fit(X[NUMS].join(self.dt_rf.predict(X[CATS]), y))
    def predict(self, X):
        return self.nn.predict(X[NUMS].join(self.dt_rf.predict(X[CATS])))

pipe = Pipeline([
    ('scale', MyScaler()),
    ('reg', MyModel())
])

params = [{
    'scale': [MyScaler(), MyScaler(StandardScaler())],
    'reg__dt_rf': [DecisionTreeRegressor(random_state=42), 
                   RandomForestRegressor(n_estimators=10, random_state=42)],
    'reg__layers': ['DR_10 DR_20 DR_15 DR_10', 
                    'DR_10 DL_20 DL_15 DL_10',
                    'DL110 DL120 DL215 DL210',
                    'DL110 DL120 DL215 DL210',
                    'DL110 BN DL120 DL215 BN DO DL210'
                   ],
    'reg__optimizer': ['RMSprop', 'Adam', 'Adadelta', 'Nadam'],
    'reg__loss': ['mean_absolute_percentage_error', 'hinge']
}]

cv = GridSearchCV(pipe, params, cv=[[train_idxs, val_idxs]], n_jobs=-1,
                  scoring={'r2': r2_scorer, 'mape': mape_scorer}, refit=False)
cv.fit(X_TRAIN.drop(columns='usd_pledged_real'), X_TRAIN.usd_pledged_real)

还有回溯的有用部分:

  File "<ipython-input-46-dea3dce0adf9>", line 8, in fit
AttributeError: 'MyModel' object has no attribute 'nn'

AttributeError                            Traceback (most recent call last)
<ipython-input-53-a290c1b3848b> in <module>()
      1 cv = GridSearchCV(pipe, params, cv=[[train_idxs, val_idxs]], n_jobs=-1,
      2                   scoring={'r2': r2_scorer, 'mape': mape_scorer}, refit=False)
----> 3 cv.fit(X_TRAIN.drop(columns='usd_pledged_real'), X_TRAIN.usd_pledged_real)
...
AttributeError: 'MyModel' object has no attribute 'nn'

编辑

添加后

    def __init__(self, dt_rf=None, loss=None, optimizer=None, layers=None):
        print('creation of MyModel')
        if dt_rf != None and layers != None:
            print('not empty creation')

偶数管道和参数定义产生 creation of MyModel

然后cv.fit产生

creation of MyModel
creation of MyModel
creation of MyModel
creation of MyModel
creation of MyModel
creation of MyModel
creation of MyModel
creation of MyModel
creation of MyModel
creation of MyModel

0 个答案:

没有答案