如何更改我的OneHotEncoder以准备更改

时间:2019-05-10 08:24:36

标签: python pandas numpy scikit-learn one-hot-encoding

因此,目前我对分类特征进行编码的方式如下:

# Import the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('weatherHistory_edited.csv')
X = dataset.iloc[:, :-1].values
Y = dataset.iloc[:, 6].values

# Encode categorical features
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

labelencoder_X = LabelEncoder()
X[:, 5] = labelencoder_X.fit_transform(X[:,5])
onehotencoder = OneHotEncoder(categorical_features= [5])
X = onehotencoder.fit_transform(X).toarray()

这很好用,唯一的问题是我得到警告,categorical_features是版本0.20中已弃用的关键字,并将在0.22中删除。您可以改用ColumnTransformer。

所以我将最后一个代码块切换为:

# Encode categorical features
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

columntransformer = ColumnTransformer([("one_hot_encoder", OneHotEncoder(), [5])], remainder= "passthrough")
X = np.array(columntransformer.fit_transform(X))

现在,当我使用此代码时,我没有收到错误,但是我的X数组完全混乱了,甚至变成了一个奇怪的元组。

另一个怪异的部分是,当使用其他数据集时,该代码似乎确实有效。 示例:

# Import the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('50_Startups.csv')
X = dataset.iloc[:, :-1].values
Y = dataset.iloc[:, 4].values

# Encode categorical features
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

columntransformer = ColumnTransformer([("one_hot_encoder", OneHotEncoder(), [3])], remainder= "passthrough")
X = np.array(columntransformer.fit_transform(X))

在此示例中,X值获得了预期值。

我将示例数据集上传到了公共仓库,因此您可以重新创建问题:

https://github.com/BjornPijpops/encoding_issue

0 个答案:

没有答案