与使用python在sklearn中预处理数据有关

时间:2018-03-01 13:05:52

标签: python

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
#mydataset = pd.read_csv('AttributeDataset.csv')
names =
['Dress_ID','Style','Price','Rating','Size','Season','NeckLine', 
'SleeveLength','waiseline','Material','FabricType','Decoration','Pattern 
Type','Recommendaation']
dataframe = pd.read_csv('AttributeDataset.csv',names=names)
print(dataframe.shape)
array = dataframe.values
X = array[:,:-1]
Y = array[:,-1]

from sklearn.cross_validation import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2)

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(X_train)`

对于此代码,我们无法拟合数据。编译此代码时,会发生以下错误:

Traceback (most recent call last):

  File "<ipython-input-29-3df12e017cba>", line 1, in <module>
    le.fit(X)

  File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\preprocessing    \label.py", line 95, in fit
y = column_or_1d(y, warn=True)

  File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 614, in column_or_1d
    raise ValueError("bad input shape {0}".format(shape))

ValueError: bad input shape (501, 13).

任何人都可以帮我解决这个问题吗?并向我们​​解释如何预处理数据并将其从分类值转换为数值。

1 个答案:

答案 0 :(得分:1)

您只能将数组传递给LabelEncoder对象的拟合方法,但是您将矩阵传递给它(X_train)。在X_train中找到具有分类值的列,并将其传递给LabelEncoder,如

le = le.fit(X_train[:, 0]) // to encode the first column
X_train[:, 0] = le.transform(X_train[:, 0]) // to convert to numerical

您可以使用

在同一个调用中同时进行拟合和变换
X_train[:, 0] = le.fit_transform(X[:, 0])