我是数据科学领域的新人,我想根据分类数据进行分类。 我想在使用K-means算法之前做一下,但是当我使用fit_transform()却又不知道如何解决时,遇到了“错误ValueError:输入形状错误(2835,18)”。我希望有人能帮助我。
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
#load my data
myData = pd.read_excel('panelForOneHot.xlsx')
myData = myData.dropna()
myData.reset_index(drop = True, inplace = True)
myData
values = np.array(myData)
print(values)
#integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
答案 0 :(得分:1)
LabelEncoder()需要一维数据。传递要编码的特定字段,如下所示。
# Import label encoder
from sklearn import preprocessing
# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
# Encode labels in column 'species'.
df['species']= label_encoder.fit_transform(df['species'])
df['species'].unique()
如果您打算对所有列进行编码,
df.apply(LabelEncoder().fit_transform)
如果您打算对多列而不是全部进行编码,
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import RobustScaler
from sklearn.preprocessing import OneHotEncoder
categorical_columns = ['country', 'gender']
numerical_columns = ['age']
column_trans = make_column_transformer(
(categorical_columns, OneHotEncoder(handle_unknown='ignore'),
(numerical_columns, RobustScaler())
column_trans.fit_transform(df)