R中的randomForest内存远远少于物理限制

时间:2016-09-08 23:33:33

标签: r linux memory-management random-forest

我正在尝试运行构建randomForests的R脚本,脚本会因"cannot allocate vector of size 549.4 Mb"错误而死亡。我在具有8个内核和7.2 GB内存的Google Cloud Engine Linux实例上运行64位R。我看到其他人在R中的内存限制有问题,但我不明白为什么我在实例上的物理分配下限制到目前为止。系统级别的内存使用情况跟踪显示计算机内存不足。对于看起来很重要的一切(下面的输出),ulimit设置为无限制。 问题:如何增加R可以分配给矢量的内存量?

该代码旨在测试在并行内核上使用randomForest的可伸缩性/时间收益。它一直有效,直到模型需要适合6000个训练样例,所以我知道至少最内部2个循环的代码函数。我也尝试过添加显式GC调用,gcinfo输出说我剩下约50%,直到我需要构建更大的模型(有6000个输入点)。

代码:

install.packages(c("randomForest", "doMC", "foreach", "dismo", "raster", "gbm", "SDMTools", "RMySQL", "rgdal", "gam", "earth"), repos='http://cran.mtu.edu/')
library(foreach)
library(raster)
library(dismo)
library(SDMTools)
library(parallel)
library(randomForest)
library(RMySQL)
library(doMC)

picea_points <- read.csv(paste(occPath, "picea_ready.csv", sep=""))

treeSeq <- seq(from=1000, to=11000, by=5000)
TexSeq <- seq(from=11000, to=11000, by =5000)
totalCores <- detectCores()
for (ncores in 3:totalCores){
  registerDoMC(cores = ncores)
  for (numTex in TexSeq){ ## change the number of training examples
    for (numTrees in treeSeq){ ## number of randomForest trees
      for (rep in 1:5){ ## replicate benchmarks
      ## Take a testing set
        t1 <- Sys.time()
        q <- nrow(picea_points)
        q_test <- 0.75*q
        testing_set <- picea_points[sample(q, q_test), ] ## select q_test random rows from points

        ## now take a random sampling on nocc rows
        training_set <- points[sample(q, numTex), ] ## this is what we will build the model upon
        training_set <- na.omit(training_set)
        x <- as.matrix(training_set[c('bio2', 'bio7', 'bio8', 'bio15', 'bio18', 'bio19')])
        y <- training_set[['presence']]
        model <- foreach(ntree=rep(numTrees, ncores), .combine=combine, .multicombine=TRUE,
                         .packages='randomForest') %dopar% {
                           randomForest(x, y, ntree=ntree)}

        t2 <- Sys.time()
        # save to database
        # ...
      }
    }
  }
}

ulimit -a

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 28716
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 28716
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

R Info

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

我有错误日志,如果有帮助,也可以发布。

MemoryUsage

0 个答案:

没有答案