为什么在KNN模型中出现此值错误?

时间:2018-06-26 16:10:27

标签: python pandas scikit-learn

我将KNN模型应用于威斯康星州乳腺癌数据,但是每次运行代码时,都会出现此错误:

  

ValueError:找到样本数量不一致的输入变量:[559,140]

import numpy as np
import pandas as pd
from sklearn import preprocessing,cross_validation,neighbors

df=pd.read_csv('breast-cancer-wisconsin.data.txt')
df.replace('?',-99999,inplace=True)
df.drop(['id'],1,inplace=True)

X=np.array(df.drop(['class'],1))
y=np.array(df['class'])

X_train, y_train, X_test, y_test = cross_validation.train_test_split(X, y, test_size=0.2)

clf = neighbors.KNeighborsClassifier()
clf.fit(X_train, y_train)
accuracy=clf.score(X_test, y_test)
print(accuracy)

example=np.array([4,2,1,1,1,2,3,2,1])
example=example.reshape(-1,1)

prediction=clf.predict(example)
print(prediction)

1 个答案:

答案 0 :(得分:1)

根据documentation,cross_validation.train_test_split的输出应为X_train, X_test, y_train, y_test。将代码中的该行更改为:

X_train,X_test,y_train,y_test=cross_validation.train_test_split(X,y,test_size=0.2)