按ID对两个数据帧进行一对一匹配

时间:2019-03-15 15:02:30

标签: r dplyr

我想通过ID将这两个表组合起来,并匹配对应的数量(一对一匹配)

df1 <- data.frame(id=c("101","101","101", "102","102","102","102"),
               authno=c("A", "B", "C","A", "B", "C", "D"),
               amount=c(1083, 1329, 1083, 1330, 1330, 1330, 140))

df2 <- data.frame(id=c("101","101","101","102", "102","102","102"),
               amount=c(1329, 833, 1083, 1330, 1330, 1700, 120))

这是我想要的结果:

id  authno amount
101  A  1083
101  B  1328
101  C  NA
102  A  1330
102  B  1330
102  C  NA
102  D  NA

请注意,由于df2中只有id == 101 & amount == 1083行,因此只有与之匹配的第一行(authno == A)会匹配,而第二个实例({{1} })获得authno == C。由于NA中有两行带有df2,因此身份验证A和B可以匹配,而C则不能。

1 个答案:

答案 0 :(得分:2)

想象一下,这样做可能会更有效,但是dplyr可以处理:

library(dplyr)
df1 %>% 
  group_by(id, amount) %>% 
  arrange(authno) %>%
  mutate(row = row_number()) %>% 
  left_join(df2 %>% 
              group_by(id, amount) %>% 
              mutate(row = row_number(),
                     present_in_both = TRUE)) %>% 
  ungroup() %>% 
  mutate(amount = if_else(is.na(present_in_both),
                          NA_real_,
                          amount)) %>% 
  select(-present_in_both, -row)

如您所见,我在每个数据帧中对idamount进行分组,然后将虚拟分组ID添加为rowleft_join将匹配所有idamountrow。对于您而言,id == 101 & amount == 1083df1中发生两次,但在df2中仅发生一次,因此此设置将只允许匹配一次!

接下来,如果amount中没有匹配项(标记为left_join,则删除present_in_both。最后,我们删除两个虚拟变量row和{{ 1}}。

哪个给:

present_in_both