import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
#mydataset = pd.read_csv('AttributeDataset.csv')
names =
['Dress_ID','Style','Price','Rating','Size','Season','NeckLine',
'SleeveLength','waiseline','Material','FabricType','Decoration','Pattern
Type','Recommendaation']
dataframe = pd.read_csv('AttributeDataset.csv',names=names)
print(dataframe.shape)
array = dataframe.values
X = array[:,:-1]
Y = array[:,-1]
from sklearn.cross_validation import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2)
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(X_train)`
对于此代码,我们无法拟合数据。编译此代码时,会发生以下错误:
Traceback (most recent call last):
File "<ipython-input-29-3df12e017cba>", line 1, in <module>
le.fit(X)
File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\preprocessing \label.py", line 95, in fit
y = column_or_1d(y, warn=True)
File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 614, in column_or_1d
raise ValueError("bad input shape {0}".format(shape))
ValueError: bad input shape (501, 13).
任何人都可以帮我解决这个问题吗?并向我们解释如何预处理数据并将其从分类值转换为数值。
答案 0 :(得分:1)
您只能将数组传递给LabelEncoder对象的拟合方法,但是您将矩阵传递给它(X_train)。在X_train中找到具有分类值的列,并将其传递给LabelEncoder,如
le = le.fit(X_train[:, 0]) // to encode the first column
X_train[:, 0] = le.transform(X_train[:, 0]) // to convert to numerical
您可以使用
在同一个调用中同时进行拟合和变换X_train[:, 0] = le.fit_transform(X[:, 0])