我正在尝试使用sklearns管道来编码和缩放数据名人堂。它只是返回一个numpy数组而不是一个数据框。我希望能提供一种更简单/标准的方法来找回编码/缩放的数据帧,而不是提出一个骇人的解决方案(我最擅长!)。
这是我要编码/缩放的代码示例:
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
num_attributes = list(train_set.select_dtypes(exclude=['object'])) #to select all num columns, we exclude any column with object types
cat_attributes = list(train_set.select_dtypes(include=['object'])) #here we select all columns with object types
cat_pipeline = Pipeline([
('imputer', SimpleImputer(fill_value='none', strategy='constant')),
('one_hot', OneHotEncoder())
])
full_pipeline = ColumnTransformer([
('num', StandardScaler(), num_attributes),
('cat', cat_pipeline, cat_attributes)
])
train_set_prepared = full_pipeline.fit_transform(train_set)
结果是numpy数组:
(0, 0) nan
(0, 1) -0.002676506826924531
(0, 2) nan
(0, 3) -0.03350622836892517
(0, 4) nan
(0, 5) -0.03294496247236749
(0, 6) 0.002534826949104915
有没有一种方法可以轻松地将其转换回经过缩放/编码的数据名望?