使用OneHotEncoding更改类别变量的问题

时间:2019-07-01 23:51:21

标签: python python-3.x one-hot-encoding

我有一列性别,其值是男性(1)和女性(0)。当我使用以下代码对性别列进行一次编码时:

onehotencoder = OneHotEncoder(categorical_features=['gender'])
data = onehotencoder.fit_transform(data).toarray()

我收到以下错误:

IndexError: arrays used as indices must be of integer (or boolean) type

1 个答案:

答案 0 :(得分:0)

以下描述可以在OneHotEncoder的文档中找到:

categorical_features : ‘all’ or array of indices or mask, default=’all’ Specify what features are treated as categorical. ‘all’: All features are treated as categorical. array of indices: Array of categorical feature indices. mask: Array of length n_features and with dtype=bool.

因此,除了传递列名之外,还应该传递列索引,这样就可以解决您的问题。


顺便说一句,请注意文档中规定的弃用:
Deprecated since version 0.20: The categorical_features keyword was deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.