例如,df1如下所示-
X1 X2 X3 X4 X5
Apple Belgium Red Purchase 100
Guava Germany Green Sale 200
Grape Italy Purple Purchase 500
Orange India Orange Sale 2000
df2如下所示-
X1 X2 X3 X4 X5
Apple Belgium Red Purchase 10000
Guava Germany Green Sale 20000
Grape Italy Purple Purchase
Orange India Orange Sale 2000
我的输出应类似于-
X1 X2 X3 X4 X5.x X5.y
Apple Belgium Red Purchase 100 10000
Guava Germany Green Sale 200 20000
Grape Italy Purple Purchase 500 NA
这里涉及多个操作-
选择1中存在的行,而不选择其他行,反之亦然
当前4列匹配时,选择X5列中的不匹配项(X5是我的目标列)
我不想要比赛。
我尝试将两者的inner_join,full_join和anti_join组合使用以获得part1。如何执行第二部分? R中是否有条件连接可用于仅选择不匹配项,并在目标列相同时忽略?
我不想使用sqldf。我知道这可以在SQL中实现。我想在dplyr中执行此操作。非常感谢您的帮助。
TIA。
答案 0 :(得分:1)
left_join(df1, df2, by = c("X1", "X2", "X3", "X4")) %>%
filter(X5.x != X5.y | is.na(X5.x) | is.na(X5.y))
# X1 X2 X3 X4 X5.x X5.y
# 1 Apple Belgium Red Purchase 100 10000
# 2 Guava Germany Green Sale 200 20000
# 3 Grape Italy Purple Purchase 500 NA
R中是否有条件连接可用于仅选择不匹配项并在目标列相同时忽略?
是的,我认为您可以使用data.table
中的非等额联接来做到这一点。或您提到的sqldf
。
我想在dplyr中做到这一点。
dplyr
仅在相等时加入。因此,您加入然后进行过滤。
使用此数据:
df1 = read.table(text = "X1 X2 X3 X4 X5
Apple Belgium Red Purchase 100
Guava Germany Green Sale 200
Grape Italy Purple Purchase 500
Orange India Orange Sale 2000", header = T)
df2 = read.table(text = "X1 X2 X3 X4 X5
Apple Belgium Red Purchase 10000
Guava Germany Green Sale 20000
Grape Italy Purple Purchase NA
Orange India Orange Sale 2000", header = T)
答案 1 :(得分:1)
(df1
%>% anti_join(., df2, by = c("X1", "X2", "X3", "X4","X5"))
%>% left_join(., df2, by = c("X1", "X2", "X3", "X4"))
)
X1 X2 X3 X4 X5.x X5.y
1 Apple Belgium Red Purchase 100 10000
2 Guava Germany Green Sale 200 20000
3 Grape Italy Purple Purchase 500 NA