问:R中的KNN - 奇怪的行为

时间:2016-08-13 11:35:38

标签: r knn

有谁知道为什么以下KNN R代码对不同种子给出不同的预测?这是奇怪的,因为K <-5,因此大部分是明确定义的。此外,浮点数很大 - 因此不会出现数据问题的精确度(如此post)。

library(class)

set.seed(642002713)
m = 20
n = 1000
from = -(2^30)
to = -(from)
train = matrix(runif(m*n, from, to), nrow=m, ncol=n)
trainLabels = sample.int(2, size = m, replace=T)-1
test = matrix(runif(n, from, to), nrow=1)

K <- 5

seed <- 544336746
set.seed(seed)
pred_1 <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred_1, ", seed: ", seed)
#predicted: 0, seed: 544336746

seed <- 621513172 
set.seed(seed)
pred_2 <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred_2, ", seed: ", seed)
#predicted: 1, seed: 621513172

手动检查:

euc.dist <- function(x1, x2) sqrt(sum((x1 - x2) ^ 2))
result = vector(mode="numeric", length=nrow(train))
for(i in 1:nrow(train)) {
  result[i] <- euc.dist(train[i,], test)
}
a <- data.frame(result, trainLabels)
names(a) = c("RSSE", "labels")
b <- a[with(a, order(sums, decreasing =T)), ]
headK <- head(b, K)
message("Manual predicted K: ", paste(K," class:", names(which.max(table(headK[,2])))))
#Manual predicted K: 5  class: 1

将给出预测1,其中Top K(= 5)RSSE:

RSSE             labels
28479706980      1
28472893026      0
28063242772      1
27966740954      1
27927401005      1

所以,多数是明确的定义+ RSSE中没有小浮点差异的问题。

0 个答案:

没有答案