Question

我有两个 shp 文件。

我只想知道那些完全 (100%) 落入 shpfile_2 的 shpfile_1。

library(sf)
shp_1   <- st_read(system.file("shape/nc.shp", package="sf"))
shp_1   <- st_transform(shp_1,2154);shp_1
shp_1   <-  st_union(shp_1)

nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc,2154)
shp_2   <- nc[1,]
shp_2   <-st_make_grid(st_as_sfc(st_bbox(shp_2)),cellsize=5000)

within <- st_within(shp_2 , shp_1)

问题是 st_within 为 600 万个多边形花费了太多时间。

有什么办法可以加快这个过程吗？

Answer 1

我不确定它会有所帮助，但可能值得研究将多边形 (shp_1) 更改为线。如果您在非常大的多边形中有很多小多边形，那可能值得。

所以你的方法：

library(sf)
shp_1   <- st_read(system.file("shape/nc.shp", package="sf"))
shp_1   <- st_transform(shp_1,2154);shp_1
shp_1   <-  st_union(shp_1)

nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc,2154)
shp_2   <- nc[1,]
shp_2   <-st_make_grid(st_as_sfc(st_bbox(shp_2)),cellsize=1000)
within <- st_within(shp_2 , shp_1)
within_TF <- sapply(within, function(xx) length(xx)!=0)

这个想法是将 shp_a 多边形转换为线形，而不是使用 st_intersects 找到所有接触该线的较小的 shp_2 多边形并将它们从原始形状中删除。现在，如果您的 shp_2 大于 shp_1，您必须首先提取所有不相交的较小多边形。这是通过 st_intersects 完成的，实际上可能会违背目的，因为它可能很慢......无论如何，在这个例子中它会更好一些。

inter <- st_intersects(shp_2, shp_1)   ## find the intersects
inter_TF <- sapply(inter, function(xx) length(xx)!=0)  ## prepare the filter
shp_2_touches <- shp_2[inter_TF,]   ## apply the filter

shp_1_line <- st_cast(shp_1, "MULTILINESTRING")  ## convert to lines
inter_line <- st_intersects(shp_2_touches, shp_1_line)  ## find the polygons that touches the line
inter_line_TF <- sapply(inter_line, function(xx) length(xx)!=0)  ## prepare the filter

完成后，您可以看到两个提取是相同的：

identical(shp_2[within_TF,], shp_2_touches[!inter_line_TF,])
[1] TRUE

就速度而言，这不是一个超级增益，但在我糟糕的计算机上，它以某种方式更快......

microbenchmark::microbenchmark(within = st_within(shp_2 , shp_1),
                               multiline = {
                                 inter <- st_intersects(shp_2, shp_1)
                                 inter_TF <- sapply(inter, function(xx) length(xx)!=0)
                                 shp_2_touches <- shp_2[inter_TF,]
                                 
                                 shp_1_line <- st_cast(shp_1, "MULTILINESTRING")
                                 inter_line <- st_intersects(shp_2_touches, shp_1_line)
                               })
Unit: milliseconds
      expr      min       lq     mean   median       uq      max neval
    within 405.6318 466.0014 760.2502 617.4705 817.4921 2177.999   100
 multiline 197.9589 258.3050 523.7598 399.6880 659.9645 2239.103   100

st_intersects 没有太多办法。它通常表现得很好，但如果在你的情况下它很慢，那么解决它的方法并不多。在大多数情况下，您会希望通过执行多个较小的交集和数据缩减来减小问题的规模。

R：st_within 计算空间对象花费的时间太长

1 个答案: