Question

我在SpatialPointsDataFrame对象中有500多个点;我有一个1.7GB（200,000行x 200,000 cols）raster对象。我希望在每个500多个点周围的缓冲区内有一个栅格单元格值的列表。

我已经设法通过下面的代码实现了这一点（我得到了很多灵感from here。）。但是，它运行缓慢，我想让它运行得更快。对于宽度“小”的缓冲区，实际上运行正常，比如5km甚至15km（约1百万个单元），但是当缓冲区增加到100km（约4200万个单元）时，它变得非常慢。

通过使用apply系列和/或并行循环中的内容，我可以轻松改进下面的循环。但我怀疑它很慢，因为raster包为循环的每次交互写入400Mb +临时文件。

# packages
library(rgeos)
library(raster)
library(rgdal)

myPoints = readOGR(points_path, 'myLayer')
myRaster = raster(raster_path)

myFunction = function(polygon_obj, raster_obj) {
    # this function return a tabulation of the values of raster cells 
    # inside a polygon (buffer)

    # crop to extent of polygon
    clip1 = crop(raster_obj, extent(polygon_obj))

    # crops to polygon edge & converts to raster
    clip2 = rasterize(polygon_obj, clip1, mask = TRUE)

    # much faster than extract
    ext = getValues(clip2) 

    # tabulates the values of the raster in the polygon
    tab = table(ext)

    return(tab)
}

# loop over the points
ids = unique(myPoints$ID)
for (id in ids) {

    # select point
    myPoint = myPoints[myPoints$ID == id, ]

    # create buffer
    myPolygon = gBuffer(spgeom = myPoint, byid = FALSE, width = myWidth)

    # extract the data I want (projections, etc are fine)
    tab = myFunction(myPolygon, myRaster)

    # do stuff with tab ...
}

我的问题：

我是否有权部分责怪写作操作？如果我设法避免所有这些编写操作，这段代码会运行得更快吗？我可以访问具有32GB RAM的机器 - 所以我想可以安全地假设我可以将raster加载到内存而不需要写临时文件？
我还能做些什么来提高此代码的效率？

Answer 1

我认为你应该像这样接近它

library(raster)
library(rgdal)
myPoints <- readOGR(points_path, 'myLayer')
myRaster <- raster(raster_path)
e <- extract(myRaster, myPoints, buffer=myWidth)

然后像

etab <- sapply(e, table)

很难回答你的问题＃1，因为我们对你的数据知之甚少（我们不知道“100公里”缓冲区覆盖了多少个细胞）。但您可以设置有关何时使用rasterOptions功能写入文件的选项。您注意到getValues比提取更快，基于您链接的帖子，但我认为这是错误的，或者至少不是很重要。 crop，rasterize和getValues的组合应该具有与extract类似的效果（几乎完全与引擎盖下）相同。无论如何，如果你走这条路线，你应该传递一个空的RasterLayer，由raster(myRaster)创建，以便更快地进行裁剪。

在r

1 个答案: