为sklearn中的多个类别变量创建管道功能

时间:2020-04-03 02:42:59

标签: python scikit-learn

我是python上的sklearn的新手。

我想创建一个使用FeatureUnion()结合我的数值和分类转换的管道

我的钻石数据集如下:

    carat   cut color   clarity depth   table   price   x   y   z
0   0.23    Ideal   E   SI2 61.5    55.0    326 3.95    3.98    2.43
1   0.21    Premium E   SI1 59.8    61.0    326 3.89    3.84    2.31
2   0.23    Good    E   VS1 56.9    65.0    327 4.05    4.07    2.31
3   0.29    Premium I   VS2 62.4    58.0    334 4.20    4.23    2.63
4   0.31    Good    J   SI2 63.3    58.0    335 4.34    4.35    2.75

我的管道代码如下:

from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import FeatureUnion
import numpy as np

from sklearn.preprocessing import RobustScaler

num_attribs = diamonds.select_dtypes(include= np.number).columns.tolist()
cat_attribs = diamonds.select_dtypes(include=np.object).columns.tolist()

num_pipeline = Pipeline([
    ('selector', ColumnTransformer(num_attribs)),
    ('imputer', SimpleImputer(strategy="median")),
    ('std_scaler', StandardScaler()),

])

cat_pipeline = Pipeline([
    ('selector', ColumnTransformer(cat_attribs)),
    ('label_encoder', LabelEncoder())
])

full_pipeline = FeatureUnion(transformer_list=[
    ("num_pipeline", num_pipeline),
    ("cat_pipeline", cat_pipeline)
])

diamonds_prepared = full_pipeline.fit_transform(diamonds)

但是,我收到此消息:

ValueError: not enough values to unpack (expected 3, got 1)

我的猜测是,我试图在cat_pipeline的一个步骤中进行几次转换。我已经成功为分类类别创建了管道,但是它不在函数中

diamonds_cat = diamonds.select_dtypes(include=np.object)
diamonds_cat.apply(LabelEncoder().fit_transform)

我想将此代码放入cat_pipeline中,并找到一种替换“ label_encoder”行的方法

cat_pipeline = Pipeline([
    ('selector', ColumnTransformer(cat_attribs)),
    ('label_encoder', LabelEncoder())
])

感谢您的帮助!

0 个答案:

没有答案