使用onehotencoding构建管道,当拟合并转换为训练/测试集并将其转换为数据帧时,将导致特征不具有名称。有什么方法可以获取每个编码功能的名称吗?
# Numerical column transformer
num_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='mean')),
('scaler', StandardScaler())
])
# Categorical column transformer
cat_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='most_frequent')),
('onehot', OneHotEncoder(handle_unknown='ignore'))
])
# Preprocessing pipeline
preprocessor = ColumnTransformer(
transformers=[
('num', num_transformer, numerical_cols),
('cat', cat_transformer, categorical_cols)
])
# Fitting the data and transforming the training & test set
X_train_preprocessed = preprocessor.fit_transform(X_train)
test_preprocessed = preprocessor.fit_transform(test)
答案 0 :(得分:0)
您可以使用ColumnTransformer
的属性named_transformers_
访问转换器。您有2个名为'num'
和'cat'
的变压器,因此preprocessor.named_transformers_['cat']
使您可以访问cat_transformer
。然后使用Pipeline
的{{3}}属性,您可以访问名为OneHotEncoder
的{{1}}及其'onehot'
属性:
categories_