Question

我想找到排序向量中哪个值最接近给定值。如果有两个或更多最接近的值，则需要NA。例如，

> vec <- c(1, 2, 3, 4, 5)
> find.nearest(vec, 2.51)
[1] 3
> find.nearest(vec, 2.5)
[1] NA

这是我的实施：

find.nearest <- function(vec, x) {    # here `vec` is sorted
  nearest.idx <- which.min(abs(vec - x))
  nearest <- vec[nearest.idx]
  if ((x*2 - nearest) %in% vec ||
      duplicated(vec, fromLast=TRUE)[nearest.idx]) {
    return(NA)
  }
  return(nearest)
}

大多数情况下都适用。但是，浮点怪怪导致了问题：

> vec <- c(0.1, 0.3)
> x <- 0.2
> find.nearest(vec, x)
[1] 0.3

而不是NA，0.3被错误地返回，可能是因为0.2 x 2 - 0.3在浮点运算中并不完全是0.1。你会如何解决这个问题？

Answer 1

find.nearest <- function(vec, x, tol = sqrt(.Machine$double.eps)) { 
 dist <- abs(vec - x)
 min <- min(dist)
 ind <- which(abs(dist - min) < tol)
 if (length(ind) == 1L) vec[ind] else as(NA, class(vec))
}

vec <- c(1, 2, 3, 4, 5)

find.nearest(vec, 2.51)
#[1] 3
find.nearest(vec, 2.5)
#[1] NA

vec <- c(0.1, 0.3)

find.nearest(vec, 0.2)
#[1] NA

比较浮点数时，总是需要使用公差。显然，这个函数没有矢量化。

PS：如果你的矢量很大，那么利用它来分类以提高效率可能是有意义的，但通常你不需要打扰。如果这真的是一个问题，我建议Rcpp无论如何。

Answer 2

我不了解您代码的某些部分（此部分(x*2 - nearest) %in% vec）。但是，我认为我修复了你感兴趣的部分。我猜你可以在代码的注释部分添加重复检查。

find.nearest <- function(vec, x) {    # here `vec` is sorted

  nearest_idx1 <- which.min(abs(vec - x - .Machine$double.eps))
  nearest_idx2 <- which.min(abs(vec - x + .Machine$double.eps))

  if (nearest_idx1 != nearest_idx2)
    return(NA)

  nearest <- vec[nearest_idx1]

  #check for anything else
  #  if ((x*2 - nearest) %in% vec ||

  if (duplicated(vec, fromLast=TRUE)[nearest_idx1]) {
    return(NA)
  }

  return(nearest)
}

测试

vec <- c(0.1, 0.3)
x <- 0.2
find.nearest(vec, x)
#[1] NA

find.nearest(vec, 0.1)
#[1] 0.1
find.nearest(vec, 0.4)
#[1] 0.3
find.nearest(vec, 0)
#[1] 0.1

在R中的排序向量中查找最近的元素

2 个答案: