我有大型栅格(> 20 GB),并希望以特殊格式将每个栅格转换为csv文件,如下所示:
unique_key_column
x_coordinate
y_coordinate
layer1_values
layer2_values
等
library(raster)
r <- raster(nrows=10,ncols=10)
r[] <- rnorm(10)
stack <- stack(r,r,r,r,r)
#create function to convert coordinate to special format
# -34.9 will be 1034900000
# sxxxdddddd, where s= sign (-=1, +=2), x=degrees (34=034),
# and d = decimal (.9=900000)
formatCoordinate <- function(x){
first_part <- ifelse(x < 0 , "1","2")
second_part <- abs(as.integer(x))
#make sure 3 part has 6 decimal places, then convert it to string
third_part <- substr(gsub(".+\\.","",as.character(format(round(x, 2),
nsmall = 6))),1,6)
result <- sprintf("%s%03d%s",first_part,second_part,third_part)
result
}
#the actual processing
stack =readAll(stack)
names(stack) <-c("l1", "l2", "l3", "l4", "l5")
#convert rasterStack to dataframe
stackPoints <- as.data.frame(rasterToPoints(stack))
#format x and x coordinates
colX <- formatCoordinate(stackPoints$x)
colY <- formatCoordinate(stackPoints$y)
#combine formatted x and y coordinates to compose a unique key
pK <- paste0(colX, colY )
stackPoints["key"] <- pK
col_idx <- grep("key", names(stackPoints))
stackPoints <- stackPoints[, c(col_idx, (1:ncol(stackPoints))[-col_idx])]
#write results to a csv file
write.table(stackPoints, "r.csv", row.names=F, sep=";", dec=",", append=F)
上面的代码适用于小栅格,但对于大栅格,我无法将堆栈加载到RAM。 有没有办法将我的代码转换为使用并行处理?即读取光栅并使用多核写入csv而无需将光栅加载到RAM(Mac OSX 10.11和Ubuntu 14.04,每个8核)。 最好,
答案 0 :(得分:1)
首先,您想弄清楚如何在单个线程上编写循环,因为从for()
移动到foreach()
将非常简单。我不熟悉RasterStack对象,但它看起来像是nlayers(x)
可以计数的图层,可以用x[[i]]
提取。
首先,我会编写和调试类似的东西:
for(i in 1:nlayers(stack)){
#convert layer of rasterStack to dataframe
layer_pts <- as.data.frame(rasterToPoints(stack[[i]]))
#write layer_pts to a csv file
}
然后foreach()
很容易。请记住,您需要使用raster
包启动每个线程。为了加快合并速度,我建议使用data.table
。
library(foreach)
library(doMC)
library(data.table)
registerDoMC(detectCores() - 2) # for me this is 40 - 2 = 38
layer_list <-
foreach(i = 1:nlayers(stack), .packages = c('raster', 'data.table') ) %dopar% {
#convert layer of rasterStack to data.table
layer_pts <- as.data.table(rasterToPoints(stack[[i]]))
setkey(layer_pts, x, y) # data.table can key on x and y, no synthetic key needed
layer_pts
}
tbl_out <- Reduce(merge, layer_list) # uses keys from setkey
# if you wanted the "key" column (but not essential)
tbl_out[, key:= paste0( formatCoordinate(x), formatCoordinate(y) ) ]
write.csv(tbl_out, 'r.csv')
请注意,如果内存不足,则可能需要减少使用的内核数量。例如,registerDoMC(4)
基于反复试验。