在R中为S4对象

时间:2017-11-07 21:29:51

标签: r parallel-processing r-raster doparallel

我正在尝试优化一个函数,我将使用数百个单元格的几个栅格,所以我想并行化这个函数。

初始栅格

所以这是最初的栅格:

library(raster)
SPA <- raster(nrows=3, ncols=3, xmn = -10, xmx = -4, ymn = 4, ymx = 10)

values(SPA) <- c(0.1, 0.4, 0.6, 0, 0.2, 0.4, 0, 0.1, 0.2)

plot(SPA)

enter image description here

该函数的目标是获取一个数据帧,其中栅格中存在的所有单元格之间的距离为列,列数和列距离。

过渡层

为了做到这一点,我使用gdistance包创建了一个过渡层:

library(gdistance)
h16  <- transition(SPA, transitionFunction=function(x){1},16,symm=FALSE) 
h16   <- geoCorrection(h16, scl=FALSE)

和每个细胞的原点:

B <- xyFromCell(SPA, cell = 1:ncell(SPA))
head(B)

      x y
[1,] -9 9
[2,] -7 9
[3,] -5 9
[4,] -9 7
[5,] -7 7
[6,] -5 7

距离函数

在一些stackoverflow答案的帮助下,我做了这个功能,比gdistance中的accCost更快

accCost2 <- function(x, fromCoords) {

  fromCells <- cellFromXY(x, fromCoords)
  tr <- transitionMatrix(x)
  tr <- rBind(tr, rep(0, nrow(tr)))
  tr <- cBind(tr, rep(0, nrow(tr)))
  startNode <- nrow(tr)
  adjP <- cbind(rep(startNode, times = length(fromCells)), fromCells)
  tr[adjP] <- Inf
  adjacencyGraph <- graph.adjacency(tr, mode = "directed", weighted = TRUE)
  E(adjacencyGraph)$weight <- 1/E(adjacencyGraph)$weight
  return(shortest.paths(adjacencyGraph, v = startNode, mode = "out")[-startNode])
}

我想要并行化

使用apply我得到我想要的data.frame

connections <- data.frame(from = rep(1:nrow(B), each = nrow(B)),to = rep(1:nrow(B), nrow(B)), dist =as.vector(apply(B,1, accCost2, x = h16)))

head(connections)

  from to     dist
1    1  1      0.0
2    1  2 219915.7
3    1  3 439831.3
4    1  4 221191.8
5    1  5 312305.7
6    1  6 493316.1

这是我尝试使用parApply

library("parallel")
cl = makeCluster(3)
clusterExport(cl, c("B", "h16", "accCost2"))
clusterEvalQ(cl, library(gdistance), library(raster))

connections <- data.frame(from = rep(1:nrow(B), each = nrow(B)),to = rep(1:nrow(B), nrow(B)), dist =as.vector(parRapply(cl, B,1, accCost2, x = h16)))

stopCluster(cl)

但是我收到以下错误:

Error in x[i, , drop = FALSE] : object of type 'S4' is not subsettable

我在并行化方面相当新,而且我不确定我做错了什么

1 个答案:

答案 0 :(得分:3)

您的代码中存在多个语法问题。

此代码适用于我。

library("parallel") 

accCost_wrap <- function(x){accCost2(h16,x)}
#Instead of including h16 in the parRapply function, 
#just get it in the node environment    

cl = makeCluster(3)  

clusterExport(cl, c("h16", "accCost2")) 
#B will be "sent" to the nodes through the parRapply function.

clusterEvalQ(cl, {library(gdistance)}) 
#raster is a dependency of gdistance, so no need to include raster here.    

pp <- parRapply(cl, x=B, FUN=accCost_wrap) 

stopCluster(cl)

connections <- data.frame(from = rep(1:nrow(B), each = nrow(B)),  
to = rep(1:nrow(B), nrow(B)),  
dist = as.vector(pp))

您的accCost版本确实比gdistance中的版本更快。您的版本会省略检查以查看您的积分是否在过渡图层范围内。谨慎行事。

(您可以通过将单元格编号作为输入来使您的功能更快。此外,从每个节点发回大量数据似乎效率不高。)