如何通过shapefile多边形聚合地理编码数据以使用R进行可视化?

时间:2016-03-01 01:34:12

标签: r ggplot2 gis sp

我有一个地理编码数据集,我试图聚合成多边形,以便我可以将结果绘制为不同级别(例如郊区,当地政府区域等)的一系列等值区域地图。

为此,我正在采用一种方法 - shown here - 使用over包中的sp函数将数据与空间对象连接,并找到我的co -ordinates(来自一个单独的文件)陷入其中。然后,我使用ggplot2强化了空间对象。

总的来说,我似乎已经让大部分过程正常工作,但是从结果图表中可以看出。当我将坐标与多边形匹配时,我显然没有做正确的事情。多边形(表示郊区)应该是整体形状。我无法解决我的工作流程中的哪一部分导致这种混乱。 Dodgy polygons 谁能告诉我这里可能出错的地方?有没有比使用over更好的方法来解决多边形点问题?

shapefile可以从澳大利亚统计局网站here下载(文件:“状态郊区ASGS非ABS结构Ed 2011数字边界,ESRI Shapefile格式”)。我已经在google工作表中保存了一些地理编码的示例数据,可以通过运行下面的代码来访问它。

我最初的尝试是在下面的代码中:

## LOAD REQUIRED PACKAGES

library(googlesheets)
library(dplyr)
library(ggplot2)
library(sp)
library(rgdal)
library(maptools)

## READ DATA FROM GOOGLE SHEETS FILE

googleDocKey <- "1IyXSC0dtOCh1xGFiBG38nKzK2nO8wKUECRCEhvtZVS0"
geoCodedData <- googleDocKey %>% gs_key()
geoData <- geoCodedData %>% gs_read(ws = "geoData", range = cell_limits()) 
suburbList <- geoCodedData %>% gs_read(ws = "suburbList", range = cell_limits())

## SET COORDINATES FROM GEOCODED DATA

geoData <- as.data.frame(geoData)
coordinates(geoData) <- c("Longitude","Latitude")

## LOAD AUSTRALIA SHAPEFILE AND SUBSET FOR NSW 
## YOU WILL NEED TO DOWNLOAD THIS FILE FROM THE ABS MANUALLY (LINK ABOVE)

ausSuburbs <- readOGR(dsn ="02 - Shapefiles", layer="SSC_2011_AUST")
suburbList$SSC_CODE_2011 <- as.numeric(suburbList$SSC_CODE_2011)
nswSuburbList <- suburbList %>%
        filter(SSC_CODE_2011 < 20000) %>%
        filter(SSC_CODE_2011 > 9999) %>%
        select(SSC_CODE_2011)
nswSuburbs <- ausSuburbs[ausSuburbs$SSC_CODE %in% nswSuburbList$SSC_CODE_2011, ]   
nswSuburbs <- nswSuburbs[!nswSuburbs$SSC_CODE %in% 11408,] # exclude Lord Howe Island

## TELL R THAT THE COORDINATES IN THE SHAPEFILE MATCH THOSE IN THE SPATIAL POINTS DATA FRAME

proj4string(geoData) <- proj4string(nswSuburbs)

## ASSIGN UNIQUE IDENTIFIER TO EACH SPATIAL OBJECT

nswSuburbs@data$id <- rownames(nswSuburbs@data)

nswSuburbs@data <- mutate(nswSuburbs@data, id_poly = as.numeric(rownames(nswSuburbs@data)))

geoData@data <- mutate(geoData@data, id_shape = as.numeric(rownames(geoData@data)))

## GET THE SUBURB THAT THE POINT IS LOCATED IN

gpsSuburb <- over(geoData, nswSuburbs)

## ADD 'id_shape' TO THE DATA FRAME

gpsSuburbID <- mutate(gpsSuburb, id_shape = as.numeric(rownames(gpsSuburb)))

## AGGREGATE DROP BEAR DATA BY SUBURB

gpsSuburbJoin <- left_join(geoData@data, gpsSuburbID, by = c("id_shape" = "id_shape"))
gpsSuburbData <- gpsSuburbJoin %>%
        group_by(SSC_CODE) %>%
        summarise(DropBearSightings = sum(DropBearSightings))
gpsSuburbData <- as.data.frame(gpsSuburbData)

## CONVERT SHAPEFILE TO DATA FRAME TO ALLOW DATA TO BE JOINED TO IT

nswPoints <- fortify(nswSuburbs, region="id")
nswData <- merge(nswPoints, nswSuburbs, by="id", stringsAsFactors=FALSE)
nswData$id <- as.numeric(nswData$id)

nswSuburbMapData <- merge(nswData, gpsSuburbData, by="SSC_CODE", stringsAsFactors=FALSE)
nswSuburbMapData <- nswSuburbMapData[order(nswSuburbMapData$id,     nswSuburbMapData$id),]

## SET THEME FOR GGPLOT

theme_clean <- function(base_size = 12) {
        require(grid)
        theme_grey(base_size) %+replace%
                    theme(
                                axis.title = element_blank(),
                                axis.text = element_blank(),
                                panel.background = element_blank(),
                                panel.grid = element_blank(),
                                axis.ticks.length = unit(0,"cm"), 
                                axis.ticks.margin = unit(0,"cm"),
                                panel.margin = unit(0,"lines"),
                                plot.margin = unit(c(0, 0, 0, 0), "lines"),
                                complete = TRUE
                    )}

## PLOT TEST MAP USING GGPLOT

dropBearMap <- ggplot(nswSuburbMapData) +
        aes(long, lat, group=group, fill=DropBearSightings) +
        geom_polygon() +
        coord_map(projection = "mercator", xlim = c(140.0, 154.0), ylim = c(-38.0, -27.0)) +
theme_clean()
dropBearMap
#ggsave("dropBearMap.png", type = "cairo-png")

我非常感谢有关如何解决此问题的任何建议。干杯!

1 个答案:

答案 0 :(得分:1)

好吧,我的第一个回答很简单......我对dplyr没有多少经验,在编辑数据槽时过于紧张。问题要简单得多。合并功能会弄乱强化形状文件的顺序,需要在绘图之前将其恢复,这样:

nswSuburbMapData <- nswSuburbMapData[order(nswSuburbMapData$id, nswSuburbMapData$id),]

需要成为这个:

nswSuburbMapData <- nswSuburbMapData[order(nswSuburbMapData$order),]

在绘制时产生这个:

enter image description here

您可能需要对地图进行一些其他更改才能更有用,但这应该是正确表示的数据。