Question

我在R中使用了一些空间数据，并想知道是否使用依赖于旧的Spatial格式（包sp）或新包sf的包/函数。我根据找到的代码here进行了此测试。

我们的想法是确定相对于空间点数据集中每个单点的最大距离xx米内的所有点＆＃34;。

library(tictoc)

# define a buffer distance and a toy data
maxdist <- 500
df<-data.frame(x = runif(10000, 0, 100000),y = runif(10000, 0, 100000),id = 1:10000)

# doing the analysis using sf
library(sf)
tic("sf")
pts     <- df %>% st_as_sf(coords = c("x", "y"))
pts_buf <- st_buffer(pts, maxdist,nQuadSegs = 5)
int     <- st_intersects(pts_buf, pts)
toc()

# doing the analysis using sp
library(sp)
library(rgeos)
tic("sp")
pts2      <- SpatialPointsDataFrame(df[,1:2],as.data.frame(df[,3]))
pts_buf2  <- gBuffer(pts2, byid=TRUE,width=maxdist)
int2      <- over(pts_buf2, pts2,returnList = TRUE)
toc()

# size of the objects
object.size(pts)<object.size(pts2)
object.size(pts_buf)<object.size(pts_buf2)

使用sf似乎要好得多（在我的机器中大约0.53 vs 2.1秒）并且需要更少的内存。但有一个例外。为什么对象 pts 远大于 pts2 ？在存储点矢量时，sf效率较低吗？

Answer 1

我能想到的一个原因：

pts（sf）对象保留一个属性，指定每一行的几何对象的“类型”。这是因为每一行都有可能成为sf对象中的不同几何体。

pts2（sp）位于类SpatialPointsDataFrame的对象中。它只能保存点数，因此不需要为每个“行”保留额外的属性

以sf对象的前两行为例，这些是与这些几何相关联的属性

lapply(1:2, function(x) attributes(st_geometry(pts[x, ])))
# [[1]]
# [[1]]$names
# [1] "1"
# 
# [[1]]$class
# [1] "sfc_POINT" "sfc"      
# 
# [[1]]$precision
# [1] 0
# 
# [[1]]$bbox
# xmin     ymin     xmax     ymax 
# 81647.16 72283.90 81647.16 72283.90 
# 
# [[1]]$crs
# Coordinate Reference System: NA
# 
# [[1]]$n_empty
# [1] 0
# 
# 
# [[2]]
# [[2]]$names
# [1] "2"
# 
# [[2]]$class
# [1] "sfc_POINT" "sfc"      
# 
# [[2]]$precision
# [1] 0
# 
# [[2]]$bbox
# xmin      ymin      xmax      ymax 
# 5591.116 38967.060  5591.116 38967.060 
# 
# [[2]]$crs
# Coordinate Reference System: NA
# 
# [[2]]$n_empty
# [1] 0

此sf data.frame的每一行都有类似的属性。

如果从几何中剥离属性（只留下data.frame）

pts_coords <- st_coordinates(pts)
pts_striped <- pts
st_geometry(pts_striped) <- NULL
pts_striped <- cbind(pts_striped, pts_coords)

然后比较对象大小

object.size(pts_striped)
# 200896
object.size(pts2)
# 203624

物体的大小更接近

比较简单要素{sf}和空间对象{sp}：速度和记忆

1 个答案: