我有几种GeoTIFF格式的大栅格。
每个尺寸为34565,116908,4 040 925 020(nrow,ncol,ncell)(完全重叠),值为不同类型:整数,浮点......
如何使用R(或其他软件)将这些栅格值写入表或数据库,以后我可以使用Spark,python或R进行分析?
我需要处理几个栅格,所以理想情况下输出表格如下:
row column raster1.value raster2.value raster3.value
1 1 56 76 100
1 2 18 45 89
... ... ... ... ...
34656 116908 23 39 43
我已经评估了具有32个内核和128 Gb RAM的计算设施。因此,并行计算也是可能的。
我将非常感谢你的帮助!
答案 0 :(得分:0)
#Load libs
library(doParallel)
library(parallel)
library(foreach)
library(data.table)
假设您有n个栅格,位于myras/
,
#List of paths to rasters
raspaths <- list.files('myras', pattern='.tif$', full.names=T)
#Register cluster for parallel processing: Cores to use: all except 1
cl <- makeCluster(detectCores() - 1)
registerDoParallel(cl, cores=detectCores() - 1)
#Raster to datatable in parallel: one raster per thread
dtlist <- foreach (ras_id=raspaths, .packages=c('raster', 'data.table'), .combine='c') %dopar% {
#Read all rasters into one big stack
ras <- raster(ras_id)
#get column and row indices
ridx <- rowFromCell(object=ras, cell=1:ncell(ras))
cidx <- colFromCell(object=ras, cell=1:ncell(ras))
#Convert to data.frame then to data.table (slowest part, perhaps someone here knows a better way?)
dt <- data.table(as.data.frame(ras))
#Set key
setkey(dt)
#Add row and column info
dt[, c('row', 'column'):=list(ridx, cidx)]
#Some column ordering
setcolorder(x=dt, c("row", "column", names(dt)[1]))
list(dt)
}
stopCluster(cl)
#Bind all per-raster datatables into one big table
big_dt <- rbindlist(dtlist, fill=T, use.names=T)
#Write to disk as comma separated text file which can then be read into any Database e.g. Postgresql
write.csv(x=dt, file='mybigtable.csv', row.names=F)
写入csv作为最后一步并不是唯一的选择。使用RPostgresql,您还可以导出&#34; big_dt&#34;直接进入本地postgres实例中的关系..