解决方案here(例如which.min(abs(w - x))which(abs(w-x)==min(abs(w-x)))等)都是O(n)而非log(n)(我假设w已经排序了。)

dt = data.table(w, val = w) # you'll see why val is needed in a sec
setattr(dt, "sorted", "w")  # let data.table know that w is sorted

请注意,如果列w尚未排序,则您必须使用setkey(dt, w)代替setattr(.)

# binary search and "roll" to the nearest neighbour
dt[J(x), roll = "nearest"]
#     w val
#1: 4.5   4


# or to get the index as Josh points out
# (and then you don't need the val column):
dt[J(x), .I, roll = "nearest", by = .EACHI]
#     w .I
#1: 4.5  3

# or to get the index alone
dt[J(x), roll = "nearest", which = TRUE]
#[1] 3

R>findInterval(4.5, c(1,2,4,5,6))
[1] 3


> library(MALDIquant)
> match.closest(x, w)
[1] 3

为了在角色向量上执行此操作,Martin Morgan在R-help上建议使用此功能:

bsearch7 <-
     function(val, tab, L=1L, H=length(tab))
     b <- cbind(L=rep(L, length(val)), H=rep(H, length(val)))
     i0 <- seq_along(val)
     repeat {
         updt <- M <- b[i0,"L"] + (b[i0,"H"] - b[i0,"L"]) %/% 2L
         tabM <- tab[M]
         val0 <- val[i0]
         i <- tabM < val0
         updt[i] <- M[i] + 1L
         i <- tabM > val0
         updt[i] <- M[i] - 1L
         b[i0 + i * length(val)] <- updt
         i0 <- which(b[i0, "H"] >= b[i0, "L"])
         if (!length(i0)) break;
     b[,"L"] - 1L

x = 4.5
w = c(1,2,4,6,7)

closestLoc = which(min(abs(w-x)))
closestVal = w[which(min(abs(w-x)))]

x = 4.5
w = c(1,2,4,6,7)

sdev = sapply(w,function(v,x) abs(v-x), x = x)
closestLoc = which(min(sdev))

对于令人发狂的长向量(数百万行!,警告 - 对于不是非常非常非常大的数据,这实际上会更慢。)


closestLoc = which(min(foreach(i = w) %dopar% {


NearestValueSearch = function(x, w){
  ## A simple binary search algo
  ## Assume the w vector is sorted so we can use binary search
  left = 1
  right = length(w)
  while(right - left > 1){
    middle = floor((left + right) / 2)
    if(x < w[middle]){
      right = middle
      left = middle
  if(abs(x - w[right]) < abs(x - w[left])){

x = 4.5
w = c(1,2,4,6,7)
NearestValueSearch(x, w) # return 3

基于@ neal-fultz答案,这是一个使用findInterval()的简单函数:

get_closest_index <- function(x, vec){
  # vec must be sorted
  iv <- findInterval(x, vec)
  dist_left <- x - vec[ifelse(iv == 0, NA, iv)]
  dist_right <- vec[iv + 1] - x
  ifelse(! is.na(dist_left) & (is.na(dist_right) | dist_left < dist_right), iv, iv + 1)
values <- c(-15, -0.01, 3.1, 6, 10, 100)
grid <- c(-2, -0.1, 0.1, 3, 7)
get_closest_index(values, grid)
#> [1] 1 2 4 5 5 5

reprex package(v0.3.0)于2020-05-29创建

您始终可以实现自定义二进制搜索算法以查找最接近的值。或者,您可以利用libc bsearch()的标准实现。您也可以使用其他二进制搜索实现,但它不会改变您必须仔细实现比较函数以找到数组中最接近的元素的事实。标准二进制搜索实现的问题在于它用于精确比较。这意味着你的即兴比较功能需要做某种 exactification 来弄清楚数组中的元素是否足够接近。为了实现它,比较函数需要了解数组中的其他元素,尤其是以下几个方面:

  • 当前元素的位置(与元素进行比较的元素) 键)。
  • 与钥匙的距离以及与邻居的比较(之前的 或下一个元素)。



#include <stdio.h>
#include <stdlib.h>

struct key {
        int key_val;
        int *array_head;
        int array_size;

int compar(const void *k, const void *e) {
        struct key *key = (struct key*)k;
        int *elem = (int*)e;
        int *arr_first = key->array_head;
        int *arr_last = key->array_head + key->array_size -1;
        int kv = key->key_val;
        int dist_left;
        int dist_right;

        if (kv == *elem) {
                /* easy case: if both same, got to be closest */
                return 0;
        } else if (key->array_size == 1) {
                /* easy case: only element got to be closest */
                return 0;
        } else if (elem == arr_first) {
                /* element is the first in array */
                if (kv < *elem) {
                        /* if keyval is less the first element then
                         * first elem is closest.
                        return 0;
                } else {
                        /* check distance between first and 2nd elem.
                         * if distance with first elem is smaller, it is closest.
                        dist_left = kv - *elem;
                        dist_right = *(elem+1) - kv;
                        return (dist_left <= dist_right) ? 0:1;
        } else if (elem == arr_last) {
                /* element is the last in array */
                if (kv > *elem) {
                        /* if keyval is larger than the last element then
                         * last elem is closest.
                        return 0;
                } else {
                        /* check distance between last and last-but-one.
                         * if distance with last elem is smaller, it is closest.
                        dist_left = kv - *(elem-1);
                        dist_right = *elem - kv;
                        return (dist_right <= dist_left) ? 0:-1;

        /* condition for remaining cases (other cases are handled already):
         * - elem is neither first or last in the array
         * - array has atleast three elements.

        if (kv < *elem) {
                /* keyval is smaller than elem */

                if (kv <= *(elem -1)) {
                        /* keyval is smaller than previous (of "elem") too.
                         * hence, elem cannot be closest.
                        return -1;
                } else {
                        /* check distance between elem and elem-prev.
                         * if distance with elem is smaller, it is closest.
                        dist_left = kv - *(elem -1);
                        dist_right = *elem - kv;
                        return (dist_right <= dist_left) ? 0:-1;

        /* remaining case: (keyval > *elem) */

        if (kv >= *(elem+1)) {
                /* keyval is larger than next (of "elem") too.
                 * hence, elem cannot be closest.
                return 1;

        /* check distance between elem and elem-next.
         * if distance with elem is smaller, it is closest.
        dist_right = *(elem+1) - kv;
        dist_left = kv - *elem;
        return (dist_left <= dist_right) ? 0:1;

int main(int argc, char **argv) {
        int arr[] = {10, 20, 30, 40, 50, 60, 70};
        int *found;
        struct key k;

        if (argc < 2) {
                return 1;

        k.key_val = atoi(argv[1]);
        k.array_head = arr;
        k.array_size = sizeof(arr)/sizeof(int);

        found = (int*)bsearch(&k, arr, sizeof(arr)/sizeof(int), sizeof(int),

        if(found) {
                printf("found closest: %d\n", *found);
        } else {
                printf("closest not found. absurd! \n");

        return 0;

