我有一列性别,其值是男性(1)和女性(0)。当我使用以下代码对性别列进行一次编码时:
onehotencoder = OneHotEncoder(categorical_features=['gender'])
data = onehotencoder.fit_transform(data).toarray()
我收到以下错误:
IndexError: arrays used as indices must be of integer (or boolean) type
答案 0 :(得分:0)
以下描述可以在OneHotEncoder的文档中找到:
categorical_features : ‘all’ or array of indices or mask, default=’all’
Specify what features are treated as categorical.
‘all’: All features are treated as categorical.
array of indices: Array of categorical feature indices.
mask: Array of length n_features and with dtype=bool.
因此,除了传递列名之外,还应该传递列索引,这样就可以解决您的问题。
顺便说一句,请注意文档中规定的弃用:
Deprecated since version 0.20: The categorical_features keyword was deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.