我正在尝试运行构建randomForests的R脚本,脚本会因"cannot allocate vector of size 549.4 Mb"
错误而死亡。我在具有8个内核和7.2 GB内存的Google Cloud Engine Linux实例上运行64位R。我看到其他人在R中的内存限制有问题,但我不明白为什么我在实例上的物理分配下限制到目前为止。系统级别的内存使用情况跟踪显示计算机内存不足。对于看起来很重要的一切(下面的输出),ulimit设置为无限制。 问题:如何增加R可以分配给矢量的内存量?
该代码旨在测试在并行内核上使用randomForest的可伸缩性/时间收益。它一直有效,直到模型需要适合6000个训练样例,所以我知道至少最内部2个循环的代码函数。我也尝试过添加显式GC调用,gcinfo输出说我剩下约50%,直到我需要构建更大的模型(有6000个输入点)。
代码:
install.packages(c("randomForest", "doMC", "foreach", "dismo", "raster", "gbm", "SDMTools", "RMySQL", "rgdal", "gam", "earth"), repos='http://cran.mtu.edu/')
library(foreach)
library(raster)
library(dismo)
library(SDMTools)
library(parallel)
library(randomForest)
library(RMySQL)
library(doMC)
picea_points <- read.csv(paste(occPath, "picea_ready.csv", sep=""))
treeSeq <- seq(from=1000, to=11000, by=5000)
TexSeq <- seq(from=11000, to=11000, by =5000)
totalCores <- detectCores()
for (ncores in 3:totalCores){
registerDoMC(cores = ncores)
for (numTex in TexSeq){ ## change the number of training examples
for (numTrees in treeSeq){ ## number of randomForest trees
for (rep in 1:5){ ## replicate benchmarks
## Take a testing set
t1 <- Sys.time()
q <- nrow(picea_points)
q_test <- 0.75*q
testing_set <- picea_points[sample(q, q_test), ] ## select q_test random rows from points
## now take a random sampling on nocc rows
training_set <- points[sample(q, numTex), ] ## this is what we will build the model upon
training_set <- na.omit(training_set)
x <- as.matrix(training_set[c('bio2', 'bio7', 'bio8', 'bio15', 'bio18', 'bio19')])
y <- training_set[['presence']]
model <- foreach(ntree=rep(numTrees, ncores), .combine=combine, .multicombine=TRUE,
.packages='randomForest') %dopar% {
randomForest(x, y, ntree=ntree)}
t2 <- Sys.time()
# save to database
# ...
}
}
}
}
ulimit -a
:
core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 28716 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 65536 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 28716 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
R Info
R version 3.3.1 (2016-06-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie)
我有错误日志,如果有帮助,也可以发布。