Question

我有形式的二维空间数据（xBin，yBin，value）。 e.g：

DT = data.table(x=c(rep(1,3),rep(2,3),rep(3,3)),y=rep(c(1,2,3),3),value=100*c(1:9))

对于每个bin，我想计算变量＆＃34; value＆＃34;的总和。在所有相邻的垃圾箱。如果bin的索引 - x和y都在距当前bin

的一个单位内，则bin被认为是邻居

e.g。对于x = 2，y = 2，我想计算

valueNeighbors(x=2,y=2) = value(x=1,y=1)+value(1,2)+value(1,3)
+value(2,1)+value(2,3)
+value(3,1)+value(3,2)+value(3,3)

我的真实数据有~1000 ^ 2个箱子，我怎么能有效地做到这一点？

Answer 1

可能使用光栅

X <- matrix(1:20, 4)
r <- raster(X)
r
agg <- as.matrix(focal(r,matrix(1,3,3),sum, pad = T, padValue = 0))
agg

     [,1] [,2] [,3] [,4] [,5]
[1,]   14   33   57   81   62
[2,]   24   54   90  126   96
[3,]   30   63   99  135  102
[4,]   22   45   69   93   70

对于大型数据集，哪种方法更快？

X <- matrix(1:1000000, 1000)
S <- matrix(NA, nrow(X), ncol(X))
r <- raster(X)

system.time(
as.matrix(focal(r,matrix(1,3,3),sum, pad = T, padValue = 0))
)
user  system elapsed 
0.39    0.08    0.47

使用1000x1000矩阵，我无法使用Winsemius（Win 7 x64 8GB RAM）提出的解决方案在可重复的时间内获得结果

Answer 2

所以这是使用R中的一些空间包的可能解决方案。请注意，它不是很精致，但它完成了这项工作。我还没有手动检查结果。我也不知道这种方法与一些提供的矩阵解决方案相比有多快。

DT<-data.frame(x=c(rep(1,3),rep(2,3),rep(3,3)),y=rep(c(1,2,3),3),value=100*c(1:9))
require(sp)
coordinates(DT)<-~x+y # Create spatial object (points)
rast<-raster(extent(DT),ncol=3,nrow=3)
grid<-rasterize(DT,rast)
grid<-rasterToPolygons(grid) # Create polygons

require(spdep)
neigh<-poly2nb(grid) # Create neighbour list
weights<-nb2listw(neigh,style="B",zero.policy=TRUE) # Create weights (binary)
grid$spatial.lag<-lag.listw(weights,grid$value,zero.policy=TRUE) # Add to raster

您只需使用

即可将空间对象更改回数据框

DT2<-data.frame(grid)

请注意，ID变量与初始数据中的rownumber相对应。

Answer 3

我不认为data.table是正确的工具。行索引的概念并不适合这种操作（尽管我可能会发布旧信息）：

 X <- matrix(1:20, 4)
 S <- matrix(NA, nrow(X), ncol(X))
for (x in row(X)){ 
       for (y in col(X)){ 
              S[x,y] <-  sum(X[abs( row(X) - x)<2 & abs( col(X)-y)<2 ])
                 }}
 S
#---------
     [,1] [,2] [,3] [,4] [,5]
[1,]   14   33   57   81   62
[2,]   24   54   90  126   96
[3,]   30   63   99  135  102
[4,]   22   45   69   93   70

考虑到效率，这个算法会更快......但仍然比raster::focal

慢得多

rows <- dim(X)[1]; cols<-dim(X)[2]
 for (x in row(X)){
    for (y in col(X)){ 
        S[x,y] <-  sum(X[max(1,x-1):min(rows, x+1) ,max(1,y-1):min(cols,y+1) ])
                   }  }

也许更快可能是：

system.time(  S2 <- X+
         rbind ( cbind(X[-1,-1], 0), 0)+  #diagonal shifts of the matrix
         rbind( cbind( 0, X[-1,-1000]) , 0)+
                       rbind( 0, cbind( X[-1000, -1] , 0))+
                       rbind(0, cbind( 0,X[-1000,-1000]) )+
          rbind(  X[ -1, ], 0)+    # these create the sums on the same rows or columns
          rbind(0,  X[-1000, ])+
                        cbind( X[ , -1],0)+
                        cbind(0, X[ , -1000])  )
   user  system elapsed 
  0.563   0.065   0.630 
> identical(S,S2) # compare to the focal-method above
[1] TRUE

R中邻居的空间数据/计算度量

3 个答案: