两个数据框的特定联接

时间:2019-02-07 14:41:52

标签: r dataframe join dplyr

我有两个数据帧:df1df2

> df1

     ID  Gender      age      cd       evnt     scr     test_dt
1 C0004    MALE       22       1          1      82    7/3/2014
2 C0004    MALE       22       1          2      76    7/3/2014
3 C0005    MALE       22       1          3    1514    7/3/2014
4 C0005    MALE       23       2          1      81   11/3/2014
5 C0006    MALE       23       2          2      75   11/3/2014
6 C0006    MALE       23       2          3     878   11/3/2014

> df2

     ID    hgt    wt     phys_dt
1 C0004     70   147   6/29/2015
2 C0004     70   157   6/27/2016
3 C0005     67   175   6/27/2016
4 C0005     65   171    7/2/2014
5 C0006     69   160   6/29/2015
6 C0006     64   143    7/2/2014

我想以产生以下数据帧的方式加入df1df2,将其称为df3

> df3

     ID   Gender      age      cd       evnt     scr     hgt     wt
1 C0004     MALE       22       1          1      82      70    147
2 C0004     MALE       22       1          2      76      70    157
3 C0005     MALE       22       1          3    1514      67    175
4 C0005     MALE       23       2          1      81      65    171
5 C0006     MALE       23       2          2      75      69    160
6 C0006     MALE       23       2          3     878      64    143

我正在尝试将df2$hgtdf2$wt添加到正确的ID行中。棘手的部分是我想将hgtwt加入日期(IDdf1$test_dt)最接近的df2$phys_dt行中。我以为我可以先按ID对两个数据框进行排序,然后按它们各自的日期排序,然后尝试加入?我不太确定该如何处理。谢谢。

1 个答案:

答案 0 :(得分:0)

如果您只想匹配df1 $ ID和df2 $ ID,则应该执行以下操作:

df3 <- left_join(df1, df2, by = c("ID" = "ID"))  

如果日期和ID应该匹配,则可以尝试:

df3 <- left_join(df1, df2, by = c("ID" = "ID", "test_dt" = "phys_dt")) 

它在库(dplyr)中