在R中将rasterstack转换为csv并行处理

时间:2016-01-21 14:32:01

标签: r csv parallel-processing raster r-raster

我有大型栅格(> 20 GB),并希望以特殊格式将每个栅格转换为csv文件,如下所示:

unique_key_column
x_coordinate
y_coordinate
layer1_values
layer2_values

library(raster)
r <- raster(nrows=10,ncols=10)
r[] <- rnorm(10)
stack <- stack(r,r,r,r,r)

   #create function to convert coordinate to special format
   # -34.9 will be 1034900000 
   # sxxxdddddd, where s= sign (-=1, +=2), x=degrees (34=034), 
   # and d = decimal (.9=900000)

formatCoordinate <- function(x){
  first_part <- ifelse(x < 0 , "1","2")
  second_part <- abs(as.integer(x))
  #make sure 3 part has 6 decimal places, then convert it to string
  third_part <- substr(gsub(".+\\.","",as.character(format(round(x, 2),
                             nsmall = 6))),1,6)
  result <- sprintf("%s%03d%s",first_part,second_part,third_part)
  result
}

  #the actual processing

stack =readAll(stack)
names(stack) <-c("l1", "l2", "l3", "l4", "l5")
#convert rasterStack to dataframe
stackPoints <- as.data.frame(rasterToPoints(stack))
#format x and x coordinates
colX <- formatCoordinate(stackPoints$x)
colY <- formatCoordinate(stackPoints$y)
#combine formatted x and y coordinates to compose a unique key
pK <- paste0(colX, colY )
stackPoints["key"] <- pK
col_idx <- grep("key", names(stackPoints))
stackPoints <- stackPoints[, c(col_idx, (1:ncol(stackPoints))[-col_idx])]
#write results to a csv file
write.table(stackPoints, "r.csv", row.names=F, sep=";", dec=",", append=F)

上面的代码适用于小栅格,但对于大栅格,我无法将堆栈加载到RAM。 有没有办法将我的代码转换为使用并行处理?即读取光栅并使用多核写入csv而无需将光栅加载到RAM(Mac OSX 10.11和Ubuntu 14.04,每个8核)。 最好,

1 个答案:

答案 0 :(得分:1)

首先,您想弄清楚如何在单个线程上编写循环,因为从for()移动到foreach()将非常简单。我不熟悉RasterStack对象,但它看起来像是nlayers(x)可以计数的图层,可以用x[[i]]提取。

首先,我会编写和调试类似的东西:

for(i in 1:nlayers(stack)){
  #convert layer of rasterStack to dataframe
  layer_pts <- as.data.frame(rasterToPoints(stack[[i]]))

  #write layer_pts to a csv file
}

然后foreach()很容易。请记住,您需要使用raster包启动每个线程。为了加快合并速度,我建议使用data.table

library(foreach)
library(doMC)
library(data.table)
registerDoMC(detectCores() - 2) # for me this is 40 - 2 = 38
layer_list <- 
  foreach(i = 1:nlayers(stack), .packages = c('raster', 'data.table') ) %dopar% {
    #convert layer of rasterStack to data.table
    layer_pts <- as.data.table(rasterToPoints(stack[[i]]))
    setkey(layer_pts, x, y) # data.table can key on x and y, no synthetic key needed
    layer_pts
  }

tbl_out <- Reduce(merge, layer_list) # uses keys from setkey

# if you wanted the "key" column (but not essential)
tbl_out[, key:= paste0( formatCoordinate(x), formatCoordinate(y) ) ]

write.csv(tbl_out, 'r.csv')

请注意,如果内存不足,则可能需要减少使用的内核数量。例如,registerDoMC(4)基于反复试验。