添加特定数量的功能(类别列)后,OneHotEncoder停止返回转换后的数组

时间:2019-07-04 10:54:40

标签: python machine-learning scikit-learn

以下代码用于对指定的列(功能)进行OneHotEncode。我有54​​个功能,并且我想对所有功能进行编码,但是由于某种原因,我可以编码的最大功能数量是25,如果我增加了要编码的功能数量,.fit_transorm()将什么也不返回。

import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.callbacks import TensorBoard

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer


# ======================== 1 - Importing the data ========================
# - Dataset has 54 features and 1 label (55 columns)
# - 10k examples

datasetPath = "10k-States(0).csv"

dataset = pd.read_csv(datasetPath)

x_train = dataset.iloc[:, 0:54]
y_train = dataset.iloc[:, 54]

# ===================== 2 - Encode x (input) values ======================

# Columns to be encoded (should be 54, but 25 is max that works...)
cols_to_encode = list(range(25))

# 'categories' parameter is multiplied by same number as above, 
# every feature has the same classes (labels)
transformer = ColumnTransformer( 
    [('one_hot_encoder', OneHotEncoder(categories=[[0,1,2,3,4,5]]*25), cols_to_encode)],  
    remainder='passthrough'                                              
)

x = transformer.fit_transform(x_train)

这是我有<= 25列时的输出: Dataset variables

这些都很好,但是只要我增加到26列或更多,x的值就是(),什么都没有。我不知道发生了什么...

1 个答案:

答案 0 :(得分:0)

尝试使用此

columnnumberist = [] #insert here all the columns numbers
from sklearn.preprocessing import OneHotEncoder
one = OneHotEncoder(categorical_features = columnnumberlist) #Might get a deprecation warning
X  = one.fit_transform(X)
X=X.toarray()