考虑如下vec
这样的正数值向量:
vec <- c(0.453, 0.864, 0.340, 0.941, 0.612, 0.899, 0.910, 0.238, 0.184, 0.803)
假设我们要查找epsilon中彼此远离的元素。一种可能的方法是:
epsilon <- 0.1
cmb <- combn( length(vec), 2 )
diff <- vec[ cmb[1,] ] - vec[ cmb[2,] ]
cmb [ ,abs(diff) <= epsilon ]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#[1,] 2 2 2 2 4 4 6 6 8
#[2,] 4 6 7 10 6 7 7 10 9
一个众所周知的问题:我们可以做得更好吗?
答案 0 :(得分:1)
您没有定义“更好”。这是替代方法:
epsilon <- 0.1
d <- as.matrix(dist(vec))
which(d < epsilon & lower.tri(d), arr.ind = TRUE)
# row col
#4 4 2
#6 6 2
#7 7 2
#10 10 2
#6 6 4
#7 7 4
#7 7 6
#10 10 6
#9 9 8
可以通过避免强制转换为密集矩阵来进一步优化大型输入矢量的性能:
d <- dist(vec)
n <- attr(d, "Size")
i <- which(d < epsilon)
rown <- (n-1):1
cols <- findInterval(i, c(0,cumsum(rown)), left.open = TRUE)
rows <- i - cumsum(rown)[cols - 1] + (1:n)[cols]
rbind(cols, rows)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#cols 2 2 2 2 4 4 6 6 8
#rows 4 6 7 10 6 7 7 10 9