我有pandas DataFrame df
。我想使用不同的编码器对df
的连续和分类特征进行编码。我发现使用make_column_transformer
很舒服,但是下面显示的代码对于LabelEncoder()
失败,但是对于OneHotEncoder(handle_unknown='ignore'))
可以正常工作。错误消息是:
TypeError:fit_transform()需要2个位置参数,但3个是 给
我不清楚如何解决此问题。
代码:
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import RobustScaler, OneHotEncoder, LabelEncoder
continuous_features = ['COL1','COL2']
categorical_features = ['COL3','COL4']
column_trans = make_column_transformer(
(categorical_features,LabelEncoder()),
(continuous_features, RobustScaler()))
X_enc = column_trans.fit_transform(df)
答案 0 :(得分:0)
根据https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_transformer.html。
make_column_transformer(
... (StandardScaler(), ['numerical_column']),
... (OneHotEncoder(), ['categorical_column']))
所以对于您的情况:
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import RobustScaler, OneHotEncoder, LabelEncoder
continuous_features = ['COL1','COL2']
categorical_features = ['COL3','COL4']
column_trans = make_column_transformer(
(OneHotEncoder(), categorical_features),
(RobustScaler(), continuous_features))
X_enc = column_trans.fit_transform(df)
如果要使用LabelEncoder()
,则只能传递一列,不能传递两列!
希望这会有所帮助。