如何使用R将大栅格写入表或数据库?

时间:2016-02-24 16:26:37

标签: r maps raster geotiff bigdata

我有几种GeoTIFF格式的大栅格。

每个尺寸为34565,116908,4 040 925 020(nrow,ncol,ncell)(完全重叠),值为不同类型:整数,浮点......

如何使用R(或其他软件)将这些栅格值写入表或数据库,以后我可以使用Spark,python或R进行分析?

我需要处理几个栅格,所以理想情况下输出表格如下:

row     column     raster1.value  raster2.value  raster3.value
1       1          56             76             100
1       2          18             45             89
...     ...        ...            ...            ...

34656   116908     23             39             43

我已经评估了具有32个内核和128 Gb RAM的计算设施。因此,并行计算也是可能的。

我将非常感谢你的帮助!

1 个答案:

答案 0 :(得分:0)

#Load libs
library(doParallel)
library(parallel)
library(foreach)
library(data.table)

假设您有n个栅格,位于myras/

#List of paths to rasters
raspaths <- list.files('myras', pattern='.tif$', full.names=T)

#Register cluster for parallel processing: Cores to use: all except 1
cl <- makeCluster(detectCores() - 1)
registerDoParallel(cl, cores=detectCores() - 1)

#Raster to datatable in parallel: one raster per thread
dtlist <- foreach (ras_id=raspaths, .packages=c('raster', 'data.table'), .combine='c') %dopar% {

          #Read all rasters into one big stack
          ras <- raster(ras_id)

          #get column and row indices
          ridx <- rowFromCell(object=ras, cell=1:ncell(ras))
          cidx <- colFromCell(object=ras, cell=1:ncell(ras))

          #Convert to data.frame then to data.table (slowest part, perhaps someone here knows a better way?)
          dt <- data.table(as.data.frame(ras))

          #Set key
          setkey(dt)

          #Add row and column info
          dt[, c('row', 'column'):=list(ridx, cidx)]

          #Some column ordering
          setcolorder(x=dt, c("row", "column", names(dt)[1]))
          list(dt)
}
stopCluster(cl)

#Bind all per-raster datatables into one big table
big_dt <- rbindlist(dtlist, fill=T, use.names=T)

#Write to disk as comma separated text file which can then be read into any Database e.g. Postgresql
write.csv(x=dt, file='mybigtable.csv', row.names=F)

写入csv作为最后一步并不是唯一的选择。使用RPostgresql,您还可以导出&#34; big_dt&#34;直接进入本地postgres实例中的关系..