使用几个大型栅格计算PCA

时间:2013-06-06 22:06:06

标签: r covariance raster pca

下面的脚本在计算pearsons相关性时对我有相同的数据。我最近调整它以创建一个协方差矩阵输入到pca。我在一个论坛上读到输入一个预先创建的协方差矩阵可能会避免记忆问题,但对我来说并非如此。运行协方差矩阵时出现以下错误:

Error: cannot allocate vector of size 1.1 Gb
In addition: Warning messages:
1: In na.omit.default(cbind(x, y)) :
  Reached total allocation of 6141Mb: see help(memory.size)
2: In na.omit.default(cbind(x, y)) :
  Reached total allocation of 6141Mb: see help(memory.size)
3: In na.omit.default(cbind(x, y)) :
  Reached total allocation of 6141Mb: see help(memory.size)
4: In na.omit.default(cbind(x, y)) :
  Reached total allocation of 6141Mb: see help(memory.size)

有人能建议一种更有效的方法来做到这一点,所以我不会遇到内存问题吗?如果我完全偏离基础,首先计算协方差,那很好。 PCA是我最终唯一需要的东西。我的数据是arcGIS栅格格式的12个1波段栅格,每个栅格都是581.15 mb。任何帮助都将非常感激。

library(rgdal)
library(raster)


setwd("K:/Documents/SDSU/Thesis/GIS Data All/GIS Layers/Generated_Layers/GridsForCor")


# List the full path to each raster:
raster_files = c('aspectclp',
                 'lakedistclp',
                 'ocdistclp',
                 'popdenclp',
                 'roaddistclp',
                 'scurveclp',
                 'sdemclp',
                 'solarradclp',
                 'sslopeclp',
                 'vegcatclp',
                 'canopcvrclp',
                 'canophtclp')

cov_matrix <- matrix(NA, length(raster_files), length(raster_files))
for (outer_n in 1:length(raster_files)) {
  outer_raster <- raster(raster_files[outer_n])
  # Start this loop at outer_n rather than 1 so that we don't compute the 
  # same covariance twice. At the end of the loops cov_matrix will be upper 
  # triangular, with the lower triangle all NA, and the diagonal all NA 
  # (since the diagonal would all be 1 anyway).
  for (inner_n in (outer_n):length(raster_files)) {
    # Don't compute correlation of a raster with itself:
    if (inner_n == outer_n) {next}
    inner_raster <- raster(raster_files[inner_n])
    cov_matrix[outer_n, inner_n] <- cov(outer_raster[], inner_raster[], 
                                    use='complete.obs', method = "spearman")
  }
}

pca_matrix <- princomp(raster_files, cor = FALSE, covmat = cov_matrix))

# Writing to a txt file & csv file
write.table(pca_matrix, "PCA.txt", sep="\t", row.names = FALSE)
write.csv(pca_matrix, "PCA.csv") enter code here

1 个答案:

答案 0 :(得分:1)

我在ffdf对象上执行pca时遇到了类似的困难。尝试在(内部)循环中插入gc(),如下所示:

for (inner_n in (outer_n):length(raster_files)) {
  # Don't compute correlation of a raster with itself:
  if (inner_n == outer_n) {next}
  inner_raster <- raster(raster_files[inner_n])
  cov_matrix[outer_n, inner_n] <- cov(outer_raster[], inner_raster[], 
                                use='complete.obs', method = "spearman")
  gc()
}

这会强制立即进行垃圾收集,这可以为for循环释放足够的内存 - 至少对我来说这是足够的。