数据预处理中Spyder的值误差

时间:2018-01-14 05:15:49

标签: machine-learning

我在执行下面的代码时遇到错误(ValueError:shape mismatch:shape数组形状(10,)无法广播到形状(1,10)的索引结果)....

将pandas导入为pd,将numpy导入为np,将matplotlib导入为mpl

dataset=pd.read_csv("pre-process_datasample.csv")
features=dataset.iloc[:,[0,1,2]].values
label=dataset.iloc[:,[3]].values

from sklearn.preprocessing import Imputer

imputerNaN=Imputer(missing_values="NaN",strategy="mean",axis=0)
features[:,[1,2]]=imputerNaN.fit_transform(features[:,[1,2]])

from sklearn.preprocessing import LabelEncoder,OneHotEncoder

label_encoder=LabelEncoder()

features[:,[0]]=label_encoder.fit_transform(features[:,[0]])


but when I change following code:
features[:,[0]]=label_encoder.fit_transform(features[:,[0]]) to
features[:,0]=label_encoder.fit_transform(features[:,0])
I don't get any error....why??? Please Help

1 个答案:

答案 0 :(得分:0)

是的,存在形状不匹配,因为features.iloc[:,[0]]会导致array([[num0],[num_1],[num_2],...,[num_9])数组的形状(10,1)。

但是label_encoder.fit_transform(features[:,[0]])导致形状数组(10,) 的所以

  • features[:,[0]].shape = (10,1)label_encoder.fit_transform(features[:,[0]]).shape = (10,)不匹配

  • 但如果是features[:,0]label_encoder.fit_transform(features[:,0]) 两者都有形状(10,)

所以它只是形状不匹配,简单地指定不同形状的数据集是不可接受的