我试图在dplyr的where子句中添加算术。我认为这些称为筛选联接。在我的示例中,我希望将sdata和fdata表设置为val和id。但是我只想加入val> id的行。
# DATASET GENERATION
id <- c(1,1,1,1,
2,2,2,2,2,2,
3,3,3,3,3,3,
5,5,5,5,
8,8,8,8,
13,13,13)
fyear <- c(1998,1999,2000,2001,1998,1999,2000,2001,2002,2003,
1998,1999,2000,2001,2002,2003,1998,1999,2000,2001,
1998,1999,2000,2001,1998,1999,2000)
byear <- c(1990,1995,2000,2005)
eyear <- c(1995,2000,2005,2010)
val <- c(3,1,5,6)
sdata <- tbl_df(data.frame(byear, eyear, val))
fdata <- tbl_df(data.frame(id, fyear))
# PSEUDO CODE FOR RESULT I AM TRYING TO ACHIEVE
inner_join(sdata, fdata, by=c("val"<"id"))
任何帮助将不胜感激
答案 0 :(得分:2)
您可以使用软件包 fuzzyjoin :
library(fuzzyjoin)
fuzzy_inner_join(sdata, fdata, by=c("val"="id"),`<`)
# A tibble: 48 x 5
# byear eyear val id fyear
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1990 1995 3 5 1998
# 2 1990 1995 3 5 1999
# 3 1990 1995 3 5 2000
# 4 1990 1995 3 5 2001
# 5 1990 1995 3 8 1998
# 6 1990 1995 3 8 1999
# 7 1990 1995 3 8 2000
# 8 1990 1995 3 8 2001
# 9 1990 1995 3 13 1998
# 10 1990 1995 3 13 1999
答案 1 :(得分:1)
tidyverse
解决方案
library(tidyverse)
id <- c(1,1,1,1,2,2,2,2,2,2,
3,3,3,3,3,3,5,5,5,5,
8,8,8,8,13,13,13)
fyear <- c(1998,1999,2000,2001,1998,
1999,2000,2001,2002,2003,
1998,1999,2000,2001,2002,
2003,1998,1999,2000,2001,
1998,1999,2000,2001,1998,1999,2000)
byear <- c(1990,1995,2000,2005)
eyear <- c(1995,2000,2005,2010)
val <- c(3,1,5,6)
sdata <- tbl_df(data.frame(byear, eyear, val))
## A tibble: 4 x 3
# byear eyear val
# <dbl> <dbl> <dbl>
#1 1990 1995 3
#2 1995 2000 1
#3 2000 2005 5
#4 2005 2010 6
fdata <- tbl_df(data.frame(id, fyear))
# A tibble: 27 x 2
# id fyear
# <dbl> <dbl>
# 1 1 1998
# 2 1 1999
# 3 1 2000
# 4 1 2001
# 5 2 1998
# 6 2 1999
# 7 2 2000
# 8 2 2001
# 9 2 2002
#10 2 2003
## ... with 17 more rows
result <- merge(sdata, fdata) %>% filter(val < id)
# byear eyear val id fyear
#1 1995 2000 1 2 1998
#2 1995 2000 1 2 1999
#3 1995 2000 1 2 2000
#4 1995 2000 1 2 2001
#5 1995 2000 1 2 2002
#6 1995 2000 1 2 2003
#7 1995 2000 1 3 1998
#8 1995 2000 1 3 1999
#9 1995 2000 1 3 2000
#10 1995 2000 1 3 2001