我有一个"大数据"现在的问题。我有10000个矩阵,每个矩阵大约10000x10000个元素并且具有命名尺寸。它们目前存储在100个R对象的组中,每个对象的大小为16GB,是100个矩阵的列表。我想做以下事情:
如果矩阵较小,我可以这样做:
load('distmat1.r') # loads list called ericsondist with 100 matrices.
roworder <- dimnames(ericsondist[[1]])[[1]] # get sorting order.
ericsondist <- lapply(ericsondist, function(x) x[roworder, roworder]) # sort the list of matrices.
distlist <- ericsondist
# Load the rest of the matrices, sort them, and concatenate.
for (i in 2:100) {
load(paste0('distmat',i,'.r'))
ericsondist <- lapply(ericsondist, function(x) x[roworder, roworder])
distlist <- c(distlist, ericsondist)
}
ericsondist <- abind::abind(ericsondist, along = 3) # bind to an array.
# Mean and upper and lower quantile of matrices, elementwise.
ericsondistmean <- apply(ericsondist, 1:2, mean)
ericsondistlower <- apply(ericsondist, 1:2, quantile, probs = 0.025)
ericsondistupper <- apply(ericsondist, 1:2, quantile, probs = 0.975)
然而,虽然我可以访问大量RAM,但是当存储在硬盘上时,由于所有矩阵都是> 1TB,因此显然会导致溢出。有没有人对如何在最小化RAM使用量的同时进行这些计算有任何建议?