这是我的代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_mldata
from sklearn import neighbors
from sklearn.model_selection import train_test_split
mnist = fetch_mldata('MNIST original')
sample = np.random.randint(70000, size=5000)
data = mnist.data[sample]
target = mnist.data[sample]
xtrain, xtest, ytrain, ytest = train_test_split(data, target, train_size=0.8)
knn = neighbors.KNeighborsClassifier(n_neighbors=3)
knn.fit(xtrain, ytrain)
error = 1 - knn.score(xtest, ytest)
print('Erreur: %f' % error)
当我运行“python numb.py”时,我收到此消息错误:
File "/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.py", line 88, in _check_targets
raise ValueError("{0} is not supported".format(y_type))
ValueError: multiclass-multioutput is not supported
答案 0 :(得分:3)
这是一个简单的拼写错误。 ytest
形状错误,因为你应该写
target = mnist.target[sample]
更正此问题,脚本运行正常。
此外,你构建sample
的方式,你可能有重复,这意味着一些图像可能同时在测试和火车集。考虑使用np.random.permutation
来改变样本的顺序。
并且在调用np.random之前考虑使用种子,以获得可重现的结果(或者更好,使用check_random_state
中的sklearn