我有两个数据帧df和df2。我想通过过滤df2在df中生成一个新列,以使过滤器取决于df。取而代之的是可以使用for循环,但是对于大数据帧来说这相当慢……请看一下代码。 提前谢谢
Fabian
#generating dummy data frames
df <- data.frame("var1" = 0:15)
df2 <- data.frame("var2" = 11:20, "var3" = 21:30)
# the following command unfortunatelly does not work
df$new_column <- df2 %>% filter(var2 > df$var1) %>% mean(var3)
# that's the output I want - but without a for-loop
for (i in 1:length(df$var1)){
h <- df2 %>% filter(var2 > df$var1[i])
df$new_column[i] <- mean(h$var3)
}
答案 0 :(得分:1)
我们可以使用sapply
,将df2
中的所有值都大于var1
中的每个值,然后取mean
。
df$new_column <- sapply(df$var1, function(x)
mean(df2$var3[df2$var2 > x], na.rm = TRUE))
df
# var1 new_column
#1 0 25.5
#2 1 25.5
#3 2 25.5
#4 3 25.5
#5 4 25.5
#6 5 25.5
#7 6 25.5
#8 7 25.5
#9 8 25.5
#10 9 25.5
#11 10 25.5
#12 11 26.0
#13 12 26.5
#14 13 27.0
#15 14 27.5
#16 15 28.0
或者可以与map_dbl
中的purrr
一起使用
df$new_column <- purrr::map_dbl(df$var1,
~mean(df2$var3[df2$var2 > .x], na.rm = TRUE))