长期读者,第一次海报。
我正在尝试对两个非常大的SpatialPolygonsDataFrame对象执行gIntersection()。第一个是所有美国县,第二个是240行x 279列网格,作为一系列66,960多边形。
我通过使用宾夕法尼亚和与PA重叠的网格成功地运行了这个:
gIntersection(PA, grid, byid=TRUE)
我试图在整个美国一夜之间运行它,它今天早上还在我的硬盘上运行10 GB(!)交换文件,没有任何进展证据。我做错了什么,或者这是正常行为,我应该做一个逐州循环?
谢谢!
答案 0 :(得分:4)
比我希望的要晚一点,但这是我最终用于与此相关的任务的功能。它可能适用于其他应用程序。
@mdsumner是正确的,废弃非相交的高级操作大大加快了这一点。希望这很有用!
library("sp")
library("rgeos")
library("plyr")
ApportionPopulation <- function(AdminBounds, poly, Admindf) { # I originally wrote this function to total the population that lies within each polygon in a SpatialPolygon object. AdminBounds is a SpatialPolygon for whatever administrative area you're working with; poly is the SpatalPolygon you want to total population (or whatever variable of your choice) across, and Admindf is a dataframe that has data for each polygon inside the AdminBounds SpatialPolygon.
# the AdminBounds have the administrative ID code as feature IDS. I set that up using spChFID()
# start by trimming out areas that don't intersect
AdminBounds.sub <- gIntersects(AdminBounds, poly, byid=TRUE) # test for areas that don't intersect
AdminBounds.sub2 <- apply(AdminBounds.sub, 2, function(x) {sum(x)}) # test across all polygons in the SpatialPolygon whether it intersects or not
AdminBounds.sub3 <- AdminBounds[AdminBounds.sub2 > 0] # keep only the ones that actually intersect
# perform the intersection. This takes a while since it also calculates area and other things, which is why we trimmed out irrelevant areas first
int <- gIntersection(AdminBounds.sub3, poly, byid=TRUE) # intersect the polygon and your administrative boundaries
intdf <- data.frame(intname=names(int)) # make a data frame for the intersected SpatialPolygon, using names from the output list from int
intdf$intname <- as.character(intdf$intname) # convert the name to character
splitid <- strsplit(intdf$intname, " ", fixed=TRUE) # split the names
splitid <- do.call("rbind", splitid) # rbind those back together
colnames(splitid) <- c("adminID", "donutshpid") # now you have the administrative area ID and the polygonID as separate variables in a dataframe that correspond to the int SpatialPolygon.
intdf <- data.frame(intdf, splitid) # make that into a dataframe
intdf$adminID <- as.character(intdf$adminID) # convert to character
intdf$donutshpid <- as.character(intdf$donutshpid) # convert to character. In my application the shape I'm using is a series of half-circles
# now you have a dataframe corresponding to the intersected SpatialPolygon object
intdf$polyarea <- sapply(int@polygons, function(x) {x@area}) # get area from the polygon SP object and put it in the df
intdf2 <- join(intdf, Admindf, by="adminID") # join together the two dataframes by the administrative ID
intdf2$popinpoly <- intdf2$pop * (intdf2$polyarea / intdf2$admin_area) # calculate the proportion of the population in the intersected area that is within the bounds of the polygon (assuming the population is evenly distributed within the administrative area)
intpop <- ddply(intdf2, .(donutshpid), summarize, popinpoly=sum(popinpoly)) # sum population lying within each polygon
# maybe do other final processing to get the output in the form you want
return(intpop) # done!
}
答案 1 :(得分:0)
您可以使用rasterize
包中的raster
更快地获得答案,并将网格作为栅格。它有一个参数可以找到单元格的多边形重叠量。
?rasterize
getCover: logical. If ‘TRUE’, the fraction of each grid cell that is
covered by the polygons is returned (and the values of
‘field, fun, mask’, and ‘update’ are ignored. The fraction
covered is estimated by dividing each cell into 100 subcells
and determining presence/absence of the polygon in the center
of each subcell
看起来你不能控制子单元的数量,尽管这可能不难打开。
答案 2 :(得分:0)
我发现-
软件包对此更胜一筹:
sf
out <- st_intersection(grid, polygons)
在试图运行的时间内锁定了我的计算机,因此需要修剪或循环遍历各个多边形,gIntersection
包中的st_intersection
可以在几秒钟内运行我的数据。
sf
还会自动合并两个输入的数据框。
感谢塔斯马尼亚大学的Grant Williamson的插图:https://atriplex.info/blog/index.php/2017/05/24/polygon-intersection-and-summary-with-sf/