Question

我有两个数据框。第一个是一组地址，包括城市和州。第二个来自邮编。我正在尝试从第一个数据框中查找状态和邮政编码匹配无效的所有行。

我试图将两个数据帧合并在一起。我成功了，可以确定哪一个匹配，但是我真的需要朝另一个方向寻找错误

Answer 1

信用转到@ ericOss，anti_join是最简单的方法

样本数据
下次提供您的数据（或像我一样建立一个小的示例集）：

library(zipcode)
data(zipcode)

# Data
df1 <- head(zipcode)
df2  <- head(zipcode)

# Remove some things
df2[2,1] <- 0000   #wrong zip
df2[4,3] <- 'FOO' # wrong stat

df1

    zip       city state latitude longitude
1 00210 Portsmouth    NH  43.0059  -71.0132
2 00211 Portsmouth    NH  43.0059  -71.0132
3 00212 Portsmouth    NH  43.0059  -71.0132
4 00213 Portsmouth    NH  43.0059  -71.0132
5 00214 Portsmouth    NH  43.0059  -71.0132
6 00215 Portsmouth    NH  43.0059  -71.0132

df2

   zip       city state latitude longitude
1 00210 Portsmouth    NH  43.0059  -71.0132
2     0 Portsmouth    NH  43.0059  -71.0132
3 00212 Portsmouth    NH  43.0059  -71.0132
4 00213 Portsmouth   FOO  43.0059  -71.0132
5 00214 Portsmouth    NH  43.0059  -71.0132
6 00215 Portsmouth    NH  43.0059  -71.0132

反加入
然后，您可以使用print(df2 %>% anti_join(df1))，它将为您提供：

    zip       city state latitude longitude
1     0 Portsmouth    NH  43.0059  -71.0132
2 00213 Portsmouth   FOO  43.0059  -71.0132

anti_join（）返回x中没有匹配值的所有行在y中，仅保留x中的列。

（anti_join随dplyr一起提供，如果尚未使用install.packages("dplyr")进行安装）

如何从两个数据框中查找缺失的元素

1 个答案: