使用H2O RandomForest堆使用错误

时间:2017-10-20 15:11:13

标签: r h2o

使用h2o.randomforest时收到此错误。请参阅下面的函数调用和相关错误。

base_line_rf <- h2o.randomForest(x=2:ncol(train),
                                y=1,
                                ntrees = 10000,
                                mtries = ncol(train)-1,
                                training_frame = train,
                                model_id <- model_id,
                                stopping_rounds = 5,
                                stopping_tolerance = 0,
                                stopping_metric = "AUC",
                                binomial_double_trees = TRUE
)

错误:

java.lang.AssertionError: I am really confused about the heap usage; MEM_MAX=7624720384 heapUsedGC=7626295912
    at water.MemoryManager.set_goals(MemoryManager.java:97)
    at water.MemoryManager.malloc(MemoryManager.java:265)
    at water.MemoryManager.malloc(MemoryManager.java:222)
    at water.MemoryManager.malloc8d(MemoryManager.java:281)
    at hex.tree.DHistogram.init(DHistogram.java:281)
    at hex.tree.DHistogram.init(DHistogram.java:240)
    at hex.tree.ScoreBuildHistogram2$ComputeHistoThread.computeChunk(ScoreBuildHistogram2.java:326)
    at hex.tree.ScoreBuildHistogram2$ComputeHistoThread.map(ScoreBuildHistogram2.java:306)
    at water.LocalMR.compute2(LocalMR.java:84)
    at water.LocalMR.compute2(LocalMR.java:76)
    at water.LocalMR.compute2(LocalMR.java:76)
    at water.LocalMR.compute2(LocalMR.java:76)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1255)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.popAndExecAll(ForkJoinPool.java:904)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:977)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

出现此错误的原因是什么?

谢谢

1 个答案:

答案 0 :(得分:1)

根据您的问题,您需要设置H2O群集以运行更多内存以适合您的10000树随机林。看起来H2O集群(Java进程)是使用8GB内存创建的,但是根据您的10000树设置,它需要更多的内存,然后给出8GB。

max_mem_size 7624.720384 MB (Configured)
heapUsedGC - 7626.295912 MB (Required)

看起来你在R中使用H2O,所以你可以在h2o.init()函数中传递max_mem_size = 12G(意味着H2O集群将以12GB内存开始),如下所示,这应该符合你的随机森林要求:

h2o.init(max_mem_size="12G")

您还可以使用以下命令检查H2O群集详细信息:

> h2o.clusterInfo()
R is connected to the H2O cluster: 
    H2O cluster uptime:         19 seconds 80 milliseconds 
    H2O cluster version:        3.14.0.3 
    H2O cluster version age:    27 days  
    H2O cluster name:           H2O_started_from_R_avkashchauhan_hwc594 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   10.65 GB <=== This is the max memory size
    H2O cluster total cores:    8 
    H2O cluster allowed cores:  8 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4 
    R Version:                  R version 3.4.1 (2017-06-30)