ValueError:输入形状错误(2835,18)

时间:2019-12-23 10:54:14

标签: python scikit-learn data-science

我是数据科学领域的新人,我想根据分类数据进行分类。 我想在使用K-means算法之前做一下,但是当我使用fit_transform()却又不知道如何解决时,遇到了“错误ValueError:输入形状错误(2835,18)”。我希望有人能帮助我。

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder

#load my data
myData = pd.read_excel('panelForOneHot.xlsx')
myData = myData.dropna()
myData.reset_index(drop = True, inplace = True)
myData

values = np.array(myData)
print(values)

#integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)

1 个答案:

答案 0 :(得分:1)

LabelEncoder()需要一维数据。传递要编码的特定字段,如下所示。

# Import label encoder 
from sklearn import preprocessing 

# label_encoder object knows how to understand word labels. 
label_encoder = preprocessing.LabelEncoder() 

# Encode labels in column 'species'. 
df['species']= label_encoder.fit_transform(df['species']) 

df['species'].unique() 

如果您打算对所有列进行编码,

df.apply(LabelEncoder().fit_transform)

如果您打算对多列而不是全部进行编码,

from sklearn.compose import make_column_transformer
from sklearn.preprocessing import RobustScaler
from sklearn.preprocessing import OneHotEncoder

categorical_columns = ['country', 'gender']
numerical_columns = ['age']
column_trans = make_column_transformer(
    (categorical_columns, OneHotEncoder(handle_unknown='ignore'),
    (numerical_columns, RobustScaler())
column_trans.fit_transform(df)