Question

我有一个与时间有关的变量，表示为两个向量：时间向量（排序）和那个时间值的向量。我想在由不同的时间排序向量指定的不同时间对该变量重新采样。

用另一种语言，我将同时浏览两个排序的时间向量。即从旧时间向量的开始进行线性搜索，直到找到新时间向量中最接近第一个元素的时间，然后从旧向量中的那个点继续进行搜索，以找到新向量中最接近第二个元素的时间等等。这样得出的结果就是O（n）。

这里的关键是时间的两个向量的长度不一样，并且元素不是一对一的配对，所以像map2或walk2这样的东西不是我想要的。

我可以使用for循环来实现同时遍历（请参见下面的代码），它可以工作，但是很慢。我也有另一个解决方案，它更具R编码性，但它是O（n ^ 2），所以它也变慢了。有没有一种R方法可以使用内部R实现来完成O（n）解决方案呢？

或者，是否有一个R函数可以用二进制搜索替换我的get_closest（），因此至少是O（nlogn）？

从我的搜索中，我怀疑答案将是“编写一个从R调用的C函数”，但是我对R还是陌生的，所以我想检查一下我是否缺少任何内容。

编辑：

我应该明确指出new_times中的值可能在old_times中不存在。我想在old_times中找到索引，其中时间最接近new_times中的每个条目。然后，在我的实际应用程序中，我将进行线性插值，但是这个问题只是关于搜索最近的邻居。

library(tidyverse)

# input values given
old_times  <- c(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
old_values <- c(3, 7, 6, 7,  8,  9,  7,  6,  4,  6)
new_times  <- c(4.1, 9.6, 12.3, 17.8)

所需的输出是

new_values <- c(7, 8, 9, 4)

我的尝试

new_values <- rep(NA, length(new_times))
old_index  <- 1

for (new_index in 1:length(new_times)) {
  while (old_index < length(old_times) &&
         old_times[old_index] < new_times[new_index]) {
    old_index <- old_index + 1
  }

  # I could now do interpolation if the value of new_times is in between
  # two values in old_times.  The key is I have a correspondence that
  # new_times[new_index] is close in time to old_times[old_index].
  new_values[new_index] <- old_values[old_index]
}


# Here's an alternative way to do it that uses more R internals,
# but winds up being O(n^2).

# Get the index in old_times closest to new_time.
# This is O(n).
get_closest <- function(new_time, old_times) {
  return(which.min(abs(new_time - old_times)))
}

# Call get_closest on each element of new_times.
# This is O(n^2).
new_indices <- unlist(map(new_times, get_closest, old_times))

# Slice the list of old values to get new values.
new_values2 <- old_values[new_indices]

Answer 1

我们可以使用match

old_values[match(new_times, old_times)]
# [1] 7 8 9 4

match(new_times, old_times)返回“第二个参数与第一个参数的（第一个）匹配位置的向量。” ，即

# [1] 2 5 6 9

我们可以使用此结果，使用old_values从[提取所需的值。

我们还可以使用%in%来返回布尔向量

old_values[old_times %in% new_times]

感谢@Andrew

Answer 2

看起来最好的方法是使用data.table。我在另一个问题中发现了这一点：

Find closest value in a vector with binary search

如果data.table知道搜索和搜索向量都已排序，则可能会进行优化，它可以执行O（n）搜索而不是O（nlogn），但是data.table已经存在在我的应用程序中非常快。

R中向量的同时遍历

2 个答案: