因此,目前我对分类特征进行编码的方式如下:
# Import the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('weatherHistory_edited.csv')
X = dataset.iloc[:, :-1].values
Y = dataset.iloc[:, 6].values
# Encode categorical features
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 5] = labelencoder_X.fit_transform(X[:,5])
onehotencoder = OneHotEncoder(categorical_features= [5])
X = onehotencoder.fit_transform(X).toarray()
这很好用,唯一的问题是我得到警告,categorical_features是版本0.20中已弃用的关键字,并将在0.22中删除。您可以改用ColumnTransformer。
所以我将最后一个代码块切换为:
# Encode categorical features
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
columntransformer = ColumnTransformer([("one_hot_encoder", OneHotEncoder(), [5])], remainder= "passthrough")
X = np.array(columntransformer.fit_transform(X))
现在,当我使用此代码时,我没有收到错误,但是我的X数组完全混乱了,甚至变成了一个奇怪的元组。
另一个怪异的部分是,当使用其他数据集时,该代码似乎确实有效。 示例:
# Import the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('50_Startups.csv')
X = dataset.iloc[:, :-1].values
Y = dataset.iloc[:, 4].values
# Encode categorical features
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
columntransformer = ColumnTransformer([("one_hot_encoder", OneHotEncoder(), [3])], remainder= "passthrough")
X = np.array(columntransformer.fit_transform(X))
在此示例中,X值获得了预期值。
我将示例数据集上传到了公共仓库,因此您可以重新创建问题: