dplyr joins-where子句中的算术

时间:2019-02-12 21:02:20

标签: r dplyr

我试图在dplyr的where子句中添加算术。我认为这些称为筛选联接。在我的示例中,我希望将sdata和fdata表设置为val和id。但是我只想加入val> id的行。

# DATASET GENERATION
    id <- c(1,1,1,1,
            2,2,2,2,2,2,
            3,3,3,3,3,3,
            5,5,5,5,
            8,8,8,8,
            13,13,13)

    fyear <- c(1998,1999,2000,2001,1998,1999,2000,2001,2002,2003,
               1998,1999,2000,2001,2002,2003,1998,1999,2000,2001,
               1998,1999,2000,2001,1998,1999,2000)

    byear <- c(1990,1995,2000,2005)
    eyear <- c(1995,2000,2005,2010)
    val <- c(3,1,5,6)

    sdata <- tbl_df(data.frame(byear, eyear, val))
    fdata <- tbl_df(data.frame(id, fyear))

# PSEUDO CODE FOR RESULT I AM TRYING TO ACHIEVE        
    inner_join(sdata, fdata, by=c("val"<"id"))

任何帮助将不胜感激

2 个答案:

答案 0 :(得分:2)

您可以使用软件包 fuzzyjoin

library(fuzzyjoin)
fuzzy_inner_join(sdata, fdata, by=c("val"="id"),`<`)
# A tibble: 48 x 5
#    byear eyear   val    id fyear
#    <dbl> <dbl> <dbl> <dbl> <dbl>
#  1  1990  1995     3     5  1998
#  2  1990  1995     3     5  1999
#  3  1990  1995     3     5  2000
#  4  1990  1995     3     5  2001
#  5  1990  1995     3     8  1998
#  6  1990  1995     3     8  1999
#  7  1990  1995     3     8  2000
#  8  1990  1995     3     8  2001
#  9  1990  1995     3    13  1998
# 10  1990  1995     3    13  1999

答案 1 :(得分:1)

tidyverse解决方案

library(tidyverse)

id <- c(1,1,1,1,2,2,2,2,2,2,
        3,3,3,3,3,3,5,5,5,5,
        8,8,8,8,13,13,13)

fyear <- c(1998,1999,2000,2001,1998,
           1999,2000,2001,2002,2003,      
           1998,1999,2000,2001,2002,
           2003,1998,1999,2000,2001,
           1998,1999,2000,2001,1998,1999,2000)

byear <- c(1990,1995,2000,2005)
eyear <- c(1995,2000,2005,2010)
val <- c(3,1,5,6)

sdata <- tbl_df(data.frame(byear, eyear, val))
## A tibble: 4 x 3
#  byear eyear   val
#  <dbl> <dbl> <dbl>
#1  1990  1995     3
#2  1995  2000     1
#3  2000  2005     5
#4  2005  2010     6

fdata <- tbl_df(data.frame(id, fyear))
# A tibble: 27 x 2
#      id fyear
#   <dbl> <dbl>
# 1     1  1998
# 2     1  1999
# 3     1  2000
# 4     1  2001
# 5     2  1998
# 6     2  1999
# 7     2  2000
# 8     2  2001
# 9     2  2002
#10     2  2003
## ... with 17 more rows

result <- merge(sdata, fdata) %>% filter(val < id)
#   byear eyear val id fyear
#1   1995  2000   1  2  1998
#2   1995  2000   1  2  1999
#3   1995  2000   1  2  2000
#4   1995  2000   1  2  2001
#5   1995  2000   1  2  2002
#6   1995  2000   1  2  2003
#7   1995  2000   1  3  1998
#8   1995  2000   1  3  1999
#9   1995  2000   1  3  2000
#10  1995  2000   1  3  2001