比较两个数据框的两列并找到未命中匹配

时间:2021-06-23 07:51:47

标签: r

我有两个不同的数据框,如下

df1 <- data.frame(state=letters[1:3],district=letters[4:6])

 state district
1     a        d
2     b        e
3     c        f

和 df2

df2 <- data.frame(state=letters[1:3], district= c("e","d","f"))

  state district
1     a        e
2     b        d
3     c        f

我想检查df1中是否存在df2的地区?如果没有选择州和地区。 如果 df1 中的地区存在于 df2 中,它是否属于df1 中完全相同的州? 假设“d”区在df1中属于“a”州,但“d”区在df2中属于“b”州,这是错误的。 我正在尝试的是:

'%noin%' <- Negate('%in%')

#creating unique id for df1
df1$uuid <- tolower(paste0(df1$state,"_",df1$district)) 

#creating unique id for df2
df2$uuid <- tolower(paste0(df2$state,"_",df2$district)) 

df_result <- df1 %>% filter(df1$uuid %noin% df2$uuid) %>% 
               select(state,district)

   state district
1     a        d
2     b        e

如何在df2中选择这些区所属的正确状态? 我的预期输出是这样的:


expected_output <- data.frame(state=c("a","b"), district=c("d","e"),state_in_df_2=c("b","a"))

 state   district   state_in_df_2
1     a        d             b
2     b        e             a

提前致谢

2 个答案:

答案 0 :(得分:1)

使用 anti_joinleft_join 您可以:

library(dplyr)

df1 <- data.frame(state=letters[1:3],district=letters[4:6])
df2 <- data.frame(state=letters[1:3], district= c("e","d","f"))

df1 %>% 
  anti_join(df2, by = c("state", "district")) %>% 
  left_join(df2, by = c("district"), suffix = c("", "_in_df2"))
#>   state district state_in_df2
#> 1     a        d            b
#> 2     b        e            a

答案 1 :(得分:1)

不确定这是否适用于您的情况,但您可以尝试

filter(merge(df1, df2, by = 'district'), state.x != state.y)

#  district state.x state.y
#1        d       a       b
#2        e       b       a