为什么在将非空间对象连接到几何数据/多边形时获得NA值?

时间:2015-05-01 17:26:46

标签: r join geometry na shapefile

我正在尝试通过" ward.name"将非空间对象(Merged_Census2011)连接到shapefile多边形(LDN_wards)。它似乎工作正常,直到我查看新创建的对象,并看到所有数据已变成NA。这是我的进展方式。

#Join Merged_Census2011 data to LDN_wards shapefile
LDN_wards <- readOGR(dsn = "data", layer = "LDN_wards")
head(LDN_wards@data)
#Explore the object
plot(LDN_wards)
summary(LDN_wards)
names(Merged_Census2011)
names(LDN_wards)
names(LDN_wards) <- c("Code", "ward.name") #rename LND-wards name heading to ward.name so it can be matched later  

#Join datasets
LDN_wards@data <- left_join(LDN_wards@data, Merged_Census2011)
head(LDN_wards@data)

我得到了:

LDN_wards@data <- left_join(LDN_wards@data, Merged_Census2011)
Joining by: "ward.name"
Warning message:
In left_join_impl(x, y, by$x, by$y) :
joining factors with different levels, coercing to character vector
> head(LDN_wards@data)
   Code    ward.name ward.code.x electorate votescast ward.code.y per.owner per.white per.noquals per.degree per.couple
1 E05000001   Aldersgate        <NA>         NA        NA        <NA>        NA        NA          NA         NA         NA
2 E05000002      Aldgate        <NA>         NA        NA        <NA>        NA        NA          NA         NA       

我有直觉这是因为两组之间有不同的行数。这可能是问题吗?是否无法连接具有不同行级别的数据集(其中一个中的缺失数据仍然是相应的观察结果无法匹配)? 我将两个数据集进行了如下比较:

#Compare the two datasets
nrow(LDN_wards)
nrow(Merged_Census2011)
LDN_wards$ward.name %in% Merged_Census2011$ward.name
LDN_wards$ward.name %in% Merged_Census2011$ward.name
> nrow(LDN_wards)
[1] 787
> nrow(Merged_Census2011)
[1] 668
> LDN_wards$ward.name %in% Merged_Census2011$ward.name
  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSEFALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [21] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
  [41]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE ETC...
> summary(LDN_wards$ward.name %in% Merged_Census2011$ward.name) 
   Mode   FALSE    TRUE    NA's 
logical      24     763       0 

可能是因为FALSE = 24?如果是,我该如何删除那些FALSE?

道歉,如果这听起来很明显,我相当新:)

感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

我刚刚尝试使用(新发现的)inner_join函数,它似乎有效。如果我理解得很好,inner_join函数只会合并匹配的行...所以我认为它会更好。事实上,我不再获得NA值了。但奇怪的是我得到了重复的观察...所以如果有人有更好的建议,非常欢迎你分享。请参阅下面的程序。

#Join datasets
LDN_wards@data <- inner_join(LDN_wards@data, Merged_Census2011)
head(LDN_wards@data, n=10)

> #Join datasets
> LDN_wards@data <- inner_join(LDN_wards@data, Merged_Census2011)
Joining by: c("ward.name", "ward.code.x", "electorate", "votescast","ward.code.y", "per.owner", "per.white", "per.noquals", "per.degree", "per.couple", "per.higher.managerial", "per.christian", "per.no.car", "per.limill", "per.goodhealth", "per.males", "per.aged60plus")
Warning message:
In inner_join_impl(x, y, by$x, by$y) :
joining character vector and factor, coercing into character vector
> head(LDN_wards@data, n=10)
    Code      ward.name ward.code.x electorate votescast ward.code.y per.owner per.white per.noquals per.degree per.couple
1  E05000007         Bridge   E05000497       8677      5654   E05000497      69.8      71.9        19.9       29.9       55.3
2  E05000026          Abbey   E05000026       8110      4712   E05000026      32.7      28.1        16.4       34.5       47.2
3  E05000026          Abbey   E05000026       8110      4712   E05000455      48.5      73.4        10.1       55.4       52.4
4  E05000026          Abbey   E05000455       7250      4808   E05000026      32.7      28.1        16.4       34.5       47.2
5  E05000026          Abbey   E05000455       7250      4808   E05000455      48.5      73.4        10.1       55.4       52.4
6  E05000027         Alibon   E05000027       6971      4127   E05000027      45.1      70.1        31.2       16.7       49.2
7  E05000028      Becontree   E05000028       7535      4538   E05000028      46.7      58.8        28.0       20.6