如何使用dplyr在R中进行条件连接?

时间:2018-10-04 13:56:25

标签: r join dplyr data-manipulation

例如,df1如下所示-

X1         X2     X3     X4         X5
Apple   Belgium   Red   Purchase   100 
Guava   Germany   Green Sale       200
Grape   Italy     Purple Purchase   500
Orange India   Orange   Sale       2000 

df2如下所示-

 X1         X2     X3     X4         X5
Apple   Belgium   Red   Purchase   10000 
Guava   Germany   Green Sale       20000
Grape   Italy     Purple Purchase   
Orange India   Orange   Sale       2000 

我的输出应类似于-

 X1         X2     X3     X4         X5.x  X5.y
Apple   Belgium   Red   Purchase   100     10000
Guava   Germany   Green Sale       200    20000
Grape   Italy     Purple Purchase   500   NA

这里涉及多个操作-

  1. 选择1中存在的行,而不选择其他行,反之亦然

  2. 当前4列匹配时,选择X5列中的不匹配项(X5是我的目标列)

  3. 我不想要比赛。

我尝试将两者的inner_join,full_join和anti_join组合使用以获得part1。如何执行第二部分? R中是否有条件连接可用于仅选择不匹配项,并在目标列相同时忽略?

我不想使用sqldf。我知道这可以在SQL中实现。我想在dplyr中执行此操作。非常感谢您的帮助。

TIA。

2 个答案:

答案 0 :(得分:1)

left_join(df1, df2, by = c("X1", "X2", "X3", "X4")) %>%
  filter(X5.x != X5.y | is.na(X5.x) | is.na(X5.y))
#      X1      X2     X3       X4 X5.x  X5.y
# 1 Apple Belgium    Red Purchase  100 10000
# 2 Guava Germany  Green     Sale  200 20000
# 3 Grape   Italy Purple Purchase  500    NA
  

R中是否有条件连接可用于仅选择不匹配项并在目标列相同时忽略?

是的,我认为您可以使用data.table中的非等额联接来做到这一点。或您提到的sqldf

  

我想在dplyr中做到这一点。

dplyr仅在相等时加入。因此,您加入然后进行过滤。


使用此数据:

df1 = read.table(text = "X1         X2     X3     X4         X5
Apple   Belgium   Red   Purchase   100 
Guava   Germany   Green Sale       200
Grape   Italy     Purple Purchase   500
Orange India   Orange   Sale       2000", header = T)

df2 = read.table(text = "X1         X2     X3     X4         X5
Apple   Belgium   Red   Purchase   10000 
Guava   Germany   Green Sale       20000
Grape   Italy     Purple Purchase   NA
Orange India   Orange   Sale       2000", header = T)

答案 1 :(得分:1)

(df1 
%>% anti_join(., df2, by = c("X1", "X2", "X3", "X4","X5")) 
%>% left_join(., df2, by = c("X1", "X2", "X3", "X4"))
)
    X1      X2     X3       X4 X5.x  X5.y
1 Apple Belgium    Red Purchase  100 10000
2 Guava Germany  Green     Sale  200 20000
3 Grape   Italy Purple Purchase  500    NA