使用管道时如何在Keras顺序模型中动态指定input_shape(input_dim)?

时间:2019-06-16 12:42:32

标签: machine-learning keras scikit-learn neural-network deep-learning

我通过将图层实例列表传递给构造函数来创建Keras顺序模型。为此,我需要将input_shape参数传递给create_model()函数的第一层。通常,我可以得到一个这样的形状元组:

input_shape=(len(X_train.keys()),)

同时,我正在使用管道来处理我的预处理步骤,例如插补,缩放,编码,特征选择等。因此,预处理后的变量/特征数量与以前不一样,而且我无法获得要在此第一个隐藏层中添加的节点数。然后,我遇到了关于density_1_input的错误,此后,我可以相应地更新形状。

现在,我想知道是否有一种方法可以在使用管道时动态指定input_shape。

使用管道清理建模代码

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.feature_selection import SelectFromModel, RFE
from sklearn.linear_model import LassoCV

numerical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')),
    ('feature_selection', SelectFromModel(LassoCV(cv=5))),
    ('scaler', StandardScaler()),
])

categorical_transformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(handle_unknown='ignore')),
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('feature_selection', SelectFromModel(LassoCV(cv=5))),
])

# Bundle preprocessing for numerical and categorical data
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_cols),
        ('cat', categorical_transformer, categorical_cols)
    ])

初始化ANN模型

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop
from keras.callbacks import Callback, EarlyStopping


def create_model(optimizer='adagrad',
                 kernel_initializer='glorot_uniform',
                 dropout=0.2):

    model = Sequential()
    model.add(Dense(64, activation='relu', kernel_initializer=kernel_initializer,
                    input_shape=(len(X_train.keys()),)))  # len(X_train.keys()) is not correct here
    model.add(Dropout(dropout))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(1))

    model.compile(loss='mean_absolute_error', optimizer=optimizer,
                  metrics=['mean_absolute_error'])

    return model

我想要的输出是在使用管道进行预处理之后访问数据框的形状。

这可能是一个类似的未解决问题: Keras + DataFrameMapper + make_pipeline, input_dim dilemma

0 个答案:

没有答案