矢量化线性插值函数,用于mutate

时间:2019-01-14 16:00:11

标签: r dplyr vectorization interpolation

我有一个看起来像这样的数据框:

# Set RNG
set.seed(33550336)

# Create toy data frame
df <- expand.grid(day = 1:10, dist = seq(0, 100, by = 10))

df1 <- df %>% mutate(region = "Here") 
df2 <- df %>% mutate(region = "There") 
df3 <- df %>% mutate(region = "Everywhere") 

df_ref <- do.call(rbind, list(df1, df2, df3))

df_ref$value <- runif(nrow(df_ref))

# > head(df_ref)
#   day dist region      value
# 1   1    0   Here 0.39413117
# 2   2    0   Here 0.44224203
# 3   3    0   Here 0.44207487
# 4   4    0   Here 0.08007335
# 5   5    0   Here 0.02836093
# 6   6    0   Here 0.94475814

这代表一个参考数据框,我想将观察结果与之比较。我的观察是在该参考数据帧(即{{1})中也发现的区域中的特定日期(即day是从1到10的整数)中发现的。 },HereThere),但是距离(Everywhere)不一定是 。它必须是0到100之间的整数。例如,我的观察数据帧(dist)可能像这样:

df_obs

由于# Observations df_obs <- data.frame(day = sample(1:10, 3, replace = TRUE), region = sample(c("Here", "There", "Everywhere")), dist = runif(3, 0, 100)) # day region dist # 1 6 Everywhere 68.77991 # 2 7 There 57.78280 # 3 10 Here 85.71628 不是整数,所以我不能像这样在dist中查找与观察值相对应的值:

df_ref

因此,我创建了一个使用线性插值函数df_ref %>% filter(day == 6, region == "Everywhere", dist == 68.77991) 的查找函数:

approx

将此应用于我的第一个观察结果,

lookup <- function(re, di, da){
  # Filter to day and region
  df_tmp <- df_ref %>% filter(region == re, day == da)

  # Approximate answer from distance
  approx(unlist(df_tmp$dist), unlist(df_tmp$value), xout = di)$y
}

尽管如此,当我使用lookup("Everywhere", 68.77991, 6) #[1] 0.8037013 应用函数时,会得到不同的答案。

mutate

我怀疑这是因为df_obs %>% mutate(ref = lookup(region, dist, day)) # day region dist ref # 1 6 Everywhere 68.77991 0.1881132 # 2 7 There 57.78280 0.1755198 # 3 10 Here 85.71628 0.1730285 的向量化不正确。为什么我会得到不同的答案,以及如何解决我的lookup函数来避免这种情况?

0 个答案:

没有答案