我对波士顿住房公司取得了令人惊讶的结果。当我对原始的Boston Housing数据集及其随机改组的版本应用交叉验证时,以下代码会产生截然不同的结果:
from sklearn.datasets import load_boston
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import cross_val_score
from sklearn.utils import shuffle
boston = load_boston()
knn = KNeighborsRegressor(n_neighbors=1)
print(cross_val_score(knn, boston.data, boston.target))
X, y = shuffle(boston.data, boston.target, random_state=0)
print(cross_val_score(knn, X, y))
输出为:
[-1.07454938 -0.50761407 0.00351173]
[0.30715435 0.36369852 0.51817514]
即使原始数据集的顺序不是随机的,为什么最近邻1预测也如此差呢?谢谢。