我有一个看起来像这样的数据框:
# Set RNG
set.seed(33550336)
# Create toy data frame
df <- expand.grid(day = 1:10, dist = seq(0, 100, by = 10))
df1 <- df %>% mutate(region = "Here")
df2 <- df %>% mutate(region = "There")
df3 <- df %>% mutate(region = "Everywhere")
df_ref <- do.call(rbind, list(df1, df2, df3))
df_ref$value <- runif(nrow(df_ref))
# > head(df_ref)
# day dist region value
# 1 1 0 Here 0.39413117
# 2 2 0 Here 0.44224203
# 3 3 0 Here 0.44207487
# 4 4 0 Here 0.08007335
# 5 5 0 Here 0.02836093
# 6 6 0 Here 0.94475814
这代表一个参考数据框,我想将观察结果与之比较。我的观察是在该参考数据帧(即{{1})中也发现的区域中的特定日期(即day
是从1到10的整数)中发现的。 },Here
或There
),但是距离(Everywhere
)不一定是 。它必须是0到100之间的整数。例如,我的观察数据帧(dist
)可能像这样:
df_obs
由于# Observations
df_obs <- data.frame(day = sample(1:10, 3, replace = TRUE),
region = sample(c("Here", "There", "Everywhere")),
dist = runif(3, 0, 100))
# day region dist
# 1 6 Everywhere 68.77991
# 2 7 There 57.78280
# 3 10 Here 85.71628
不是整数,所以我不能像这样在dist
中查找与观察值相对应的值:
df_ref
因此,我创建了一个使用线性插值函数df_ref %>% filter(day == 6, region == "Everywhere", dist == 68.77991)
的查找函数:
approx
将此应用于我的第一个观察结果,
lookup <- function(re, di, da){
# Filter to day and region
df_tmp <- df_ref %>% filter(region == re, day == da)
# Approximate answer from distance
approx(unlist(df_tmp$dist), unlist(df_tmp$value), xout = di)$y
}
尽管如此,当我使用lookup("Everywhere", 68.77991, 6)
#[1] 0.8037013
应用函数时,会得到不同的答案。
mutate
我怀疑这是因为df_obs %>% mutate(ref = lookup(region, dist, day))
# day region dist ref
# 1 6 Everywhere 68.77991 0.1881132
# 2 7 There 57.78280 0.1755198
# 3 10 Here 85.71628 0.1730285
的向量化不正确。为什么我会得到不同的答案,以及如何解决我的lookup
函数来避免这种情况?