Question

我有一个＆＃34;大数据＆＃34;现在的问题。我有10000个矩阵，每个矩阵大约10000x10000个元素并且具有命名尺寸。它们目前存储在100个R对象的组中，每个对象的大小为16GB，是100个矩阵的列表。我想做以下事情：

根据第一个矩阵的维度名称对每个矩阵进行排序。
计算矩阵列表中的元素汇总统计信息（平均值，中位数，某些分位数）。每个摘要统计信息将返回10000x10000矩阵。

如果矩阵较小，我可以这样做：

load('distmat1.r') # loads list called ericsondist with 100 matrices.
roworder <- dimnames(ericsondist[[1]])[[1]] # get sorting order.
ericsondist <- lapply(ericsondist, function(x) x[roworder, roworder]) # sort the list of matrices.
distlist <- ericsondist

# Load the rest of the matrices, sort them, and concatenate.
for (i in 2:100) {
  load(paste0('distmat',i,'.r'))
  ericsondist <- lapply(ericsondist, function(x) x[roworder, roworder])
  distlist <- c(distlist, ericsondist)
}

ericsondist <- abind::abind(ericsondist, along = 3) # bind to an array.

# Mean and upper and lower quantile of matrices, elementwise.
ericsondistmean <- apply(ericsondist, 1:2, mean)
ericsondistlower <- apply(ericsondist, 1:2, quantile, probs = 0.025)
ericsondistupper <- apply(ericsondist, 1:2, quantile, probs = 0.975)

然而，虽然我可以访问大量RAM，但是当存储在硬盘上时，由于所有矩阵都是> 1TB，因此显然会导致溢出。有没有人对如何在最小化RAM使用量的同时进行这些计算有任何建议？

R

0 个答案: