在每个多边形中为一组多边形R找到最大点

时间:2014-11-22 19:32:49

标签: r gis spatial maptools sp

我确定这个问题已在其他地方得到解答,但我无法通过搜索来解决这个问题。

我有一个代表一个国家内城市的点数以及每个城市的人口数量。我还有一个县的多边形文件。我想找到每个县内最大城市的位置。

如何做到这一点?

以下是一些数据

结构(列表(国家= c)("我们","我们","我们","我们",&#34 ;我们","我们","我们","我们","我们","我们" ,"我们",
"我们","我们","我们","我们","我们","我们& #34;,"我们","我们","我们","我们","我们",& #34;我们","我们","我们"),City = c(" cabarrus"," cox store"," cal-vel"," briarwood联排别墅","巴克高度" ;," davie
十字路口","蟹点村","杜鹃花","切斯特菲尔德","查尔斯蒙特","康纳&#34 ;," clover garden"," corriher heights"," callisons"," crestview acres"," clegg", " canaan park"," chantilly"," belgrade"," brices crossroads"," bluff",&#34 ; butner"," bottom"," bandy"," bostian heights"),AccentCity = c(" Cabarrus",&# 34; Cox Store"," Cal-Vel"," Briarwood Townhouses"," Barker Heights"," Davie Crossroads",& #34; Crab Point Village"," Azalea"," Chesterfield"," Charlesmont"," Connor"," Clover Garden"," Corriher Heights"," Callisons"," Crestview Acres"," Clegg"," Canaan Park& #34;," Chantilly"," Belgrade"," Brices Crossroads"," Bluff"," Butner", " Bottom"," Bandy"," B ostian Heights"),Region = c(" NC"," NC"," NC"," NC",&#34 ; NC"," NC"," NC"," NC"," NC"," NC" ," NC"," NC"," NC"," NC"," NC"," NC"," NC"," NC"," NC"," NC"," NC", " NC"," NC"," NC"," NC"),Population = c(NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_ ,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,A_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_),纬度=(35.2369444,35.275,36.4291667) ,35.295,35.3111111,35.8319444,34.7602778,35.58,35.81,5.9341667, 35.7419444,36.1883333,35.5605556,35.0841667,35.0213889,35.8655556,36.2761111,36.3016667,34.88,34.8186111,3.8377778,36.1319444,36.4747222,35.6419444,35.7544444),经度= c(-80.5419444,-82.0352778,-78.9694444,-81.5238889,-82.4441667, -80.535,-76.7305556,-82.4713889,-81.6611111,-81.5127778,-78.1486111,-79.4630556,-80.635,-76.7255556,-80.5427778,-78.8497222, -79.7852778,-76.1711111,-77.2352778,-78.1016667,-82.8580556,-78.7569444,-80.7741667,-81.09,-80.9294444)),。Name = c(" Country"," City&#34 ;," AccentCity"," Region"," Population"," Latitude"," Longitude"),row。 names = c(544L,889L,551L,434L,190L,975L,894L,147L, 717L,700L,831L,773L,862L,559L,915L,753L,584L,695L,262L,437L,372L,537L,406L,178L,02L),等级=" data.frame")

以及北卡罗莱纳州的一些代码

xx <- readShapePoly(system.file("shapes/sids.shp", package="maptools")[1],
                IDvar="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))

plot(xx)

我想找到每个县内人口最多的城市。对不起,我没有一个可重复的例子。如果我这样做,我会得到答案!

1 个答案:

答案 0 :(得分:0)

简短的回答是,您应该在gContains(...)包中使用rgeos

这是一个很长的答案。

在下面的代码中,我们从GADM数据库中获取北卡罗莱纳州的高分辨率shapefile,以及来自美国地质调查数据库的北卡罗来纳州城市的地理编码数据集。后者已有县信息但我们忽略了这一点。然后我们使用gContains(...)将城市映射到适当的县,将该信息添加到城市数据框中,并使用data.table包标识每个县中最大的城市。大部分工作都在接近尾声的4行代码中。

library(raster)   # for getData(...);   you may not need this
library(foreign)  # for read.dbf(...);  you may not need this
library(rgeos)    # for gContains(...); loads package sp as well

setwd("< directory for downloaded data >")
# get North Carolina Counties shapefile from GADM database
USA         <- getData("GADM",country="USA",level=2)   # level 2 is counties
NC.counties <- USA[USA$NAME_1=="North Carolina",]      # North Carolina Counties
# get North Carolina Cities data from USGS database
url <- "http://dds.cr.usgs.gov/pub/data/nationalatlas/citiesx010g_shp_nt00962.tar.gz"
download.file(url,"cities.tar.gz")
untar("cities.tar.gz")
data      <- read.dbf("citiesx010g.dbf",as.is=TRUE)
NC.data   <- data[data$STATE=="NC",c("NAME","COUNTY","LATITUDE","LONGITUDE","POP_2010")]
## --- evverything up to here is just to set up the example

# convert cities data.frame to SpatialPointsDataFrame
NC.cities <- SpatialPointsDataFrame(NC.data[,c("LONGITUDE","LATITUDE")],
                                    data=NC.data,
                                    proj4string=CRS(proj4string(NC.counties)))
# map cities to counties
city.cnty   <- gContains(NC.counties,NC.cities,byid=TRUE)
# add county information to cities data
NC.data$county <- apply(city.cnty,1,function(cnty)ifelse(any(cnty),NC.counties@data[cnty,]$NAME_2,NA))
# identify largest city in each county
library(data.table)
result <- setDT(NC.data)[,.SD[which.max(POP_2010)],by="county"]
head(result)
#      county             NAME   COUNTY LATITUDE LONGITUDE POP_2010
# 1:  Jackson        Cullowhee  Jackson 35.31371 -83.17653     6228
# 2:   Graham     Robbinsville   Graham 35.32287 -83.80740      620
# 3:   Wilkes North Wilkesboro   Wilkes 36.15847 -81.14758     4245
# 4:    Rowan        Salisbury    Rowan 35.67097 -80.47423    33662
# 5: Buncombe        Asheville Buncombe 35.60095 -82.55402    83393
# 6:    Wayne        Goldsboro    Wayne 35.38488 -77.99277    36437

这里的主力是:

city.cnty   <- gContains(NC.counties,NC.cities,byid=TRUE)

将SpatialPointsDataFrame NC.Cities中的每个点与SpatialPolygonsDataFrame NC.counties中的每个多边形进行比较,并返回逻辑矩阵,其中行代表城市,列代表县,[i,j]元素如果城市TRUE位于县i,则为j,否则为FALSE。我们在下一个语句中逐行处理矩阵:

NC.data$county <- apply(city.cnty,1,function(cnty)ifelse(any(cnty),NC.counties@data[cnty,]$NAME_2,NA))

连续使用每一行索引NC.counties的属性表以提取县名。

您在问题中提供的数据存在一些问题,但仍然具有指导意义。首先,maptools包中的NC shapefile分辨率相对较低。特别是这意味着一些沿海岛屿完全失踪,因此其中一个岛屿上的任何城市都不会映射到一个县。您的实际数据可能存在同样的问题,因此请注意它。

其次,将原始USGS数据集中的COUNTY列与我们添加的county列进行比较,有3个(865个)县不同意。事实证明,在这些情况下,USGS数据库是错误的(或过时)。您可能会遇到同样的问题,所以也要注意这一点。

第三,另外三个城市没有映射到任何一个县。这些都是沿海城市,可能反映了北卡罗来纳州形状文件中的小不准确性。你晚上也有这个问题。