问:R中的KNN - 奇怪的行为

时间:2016-08-11 15:48:41

标签: r knn

有谁知道为什么以下KNN R代码为不同的种子提供不同的预测? 这是奇怪的,因为K <-5,因此大部分是明确定义的。另外,浮点数不会小到数据问题的精度。 (评论:我知道测试与训练有很大不同。这只是为证明奇怪的KNN行为而创建的一个合成例子)

library(class)

train <- rbind(
  c(0.0626015,  0.0530052,  0.0530052,  0.0496676,  0.0530052,  0.0626015),
  c(0.0565861,  0.0569546,  0.0569546,  0.0511377,  0.0569546,  0.0565861),
  c(0.0538332,  0.057786,   0.057786,   0.0506127,  0.057786,   0.0538332),
  c(0.059033,   0.0541484,  0.0541484,  0.0501926,  0.0541484,  0.059033),
  c(0.0587272,  0.0540445,  0.0540445,  0.0505076,  0.0540445,  0.0587272),
  c(0.0578095,  0.0564349,  0.0564349,  0.0505076,  0.0564349,  0.0578095)
)
trainLabels <- c(1,
                 1,
                 0,
                 0,
                 1,
                 0)
test  <- c(0.1923241, 0.1734074, 0.1734074, 0.1647619, 0.1734074, 0.1923241)

K <- 5

set.seed(494139)
pred <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred, ", seed: ", seed)
# **predicted: 1**, seed: 494139

set.seed(5371)
pred <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred, ", seed: ", seed)
# **predicted: 0**, seed: 5371

1 个答案:

答案 0 :(得分:0)

knn函数调用名为VR_knn的基础C function(第122行),其中包含引入&#34; fuzz&#34;的步骤。或小值(epsilon,EPS)。看起来你的示例参数值可能会碰到&#34; fuzz&#34;步。有证据表明,将值舍入为4位可以得到一致性。就这样:

rm(list=ls())

library(class)
train <- rbind(
  c(0.0626015,  0.0530052,  0.0530052,  0.0496676,  0.0530052,  0.0626015),
  c(0.0565861,  0.0569546,  0.0569546,  0.0511377,  0.0569546,  0.0565861),
  c(0.0538332,  0.057786,   0.057786,   0.0506127,  0.057786,   0.0538332),
  c(0.059033,   0.0541484,  0.0541484,  0.0501926,  0.0541484,  0.059033),
  c(0.0587272,  0.0540445,  0.0540445,  0.0505076,  0.0540445,  0.0587272),
  c(0.0578095,  0.0564349,  0.0564349,  0.0505076,  0.0564349,  0.0578095)
)
trainLabels <- c(1,1,0,0,1,0)
test  <- c(0.1923241, 0.1734074, 0.1734074, 0.1647619, 0.1734074, 0.1923241)
K <- 5

train <- round(train,4)

seed <- 494139
set.seed(seed)
pred <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred, ", seed: ", seed)
# predicted: 0, seed: 494139

seed <- 5371
set.seed(seed)
pred <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred, ", seed: ", seed)
# predicted: 0, seed: 5371