尝试使用OnehotEncoder创建伪变量

时间:2020-03-26 14:20:27

标签: python dataframe machine-learning scikit-learn artificial-intelligence

我正在学习机器学习,并且正在尝试预处理数据。我碰到一个错误。 X [:, 1] = X_label_encoder_1.fit_transform(X [:,1])IndexError:索引1超出了尺寸1的轴1的范围。我尝试了所有方法,但都无法获得。

# get the dependant and independent variables
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

X = X.reshape(-1, 1)
y = y.reshape(-1, 1)

# change the categorical values into numbers
X_label_encoder_1 = LabelEncoder()
X[:, 1] = X_label_encoder_1.fit_transform(X[:,1])
X_label_encoder_2 = LabelEncoder()
X[:, 2] = X_label_encoder_2.fit_transform(X[:,2])

onehotencoder = OneHotEncoder(categories=X[1])
X = onehotencoder.fit_transform(X).toarray()

1 个答案:

答案 0 :(得分:0)

这是我的处理方式:

# load 'pandas' library
import pandas as pd

# One-hot encode categorical variable
one_hot_column_name = pd.get_dummies(dataset_name['column_to_encode']

# Drop original categorical variable after it has been encoded
dataset_name = dataset_name.drop('categorical_column', axis = 1)

# join codings together
dataset_name = dataset_name.join([one_hot_column_name])

希望这行得通,欢迎使用机器学习!