当我尝试使用rsparkling
随机森林函数在我的数据集(5888字节)上拟合随机森林模型时,我的内存不足了:
h2o.randomForest(x = x,
y = y,
training_frame = trainDatasetTopTen_tbl,
nfolds = 5).
我的配置设置:
config <- spark_config()
config$spark.driver.cores <- 3
config$spark.driver.memory <- "3.4G"
config$spark.driver.extraJavaOptions <- "append -XX:MaxPermSize= 3.8G"
sc <- spark_connect(master = 'local', config = config,
version = '2.1.0')
我的机器中可用的内存为4 GB。
H2O群集信息是:
R is connected to the H2O cluster:
H2O cluster uptime: 30 minutes 376 milliseconds
H2O cluster version: 3.10.5.2
H2O cluster version age: 24 days
H2O cluster name: sparkling-water-mubarak_local-1499963226139
H2O cluster total nodes: 1
H2O cluster total memory: 0.7 GB
H2O cluster total cores: 4
H2O cluster allowed cores: 4
H2O cluster healthy: TRUE
H2O Connection ip: 127.0.0.1
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
R Version: R version 3.4.0 (2017-04-21)
H2O启动了Java的日志信息(在http://localhost:4040/sparkling-water/下):
线程信息:Java堆totalMemory:461.0 MB Java堆maxMemory:910.5 MB thread INFO:Java版本:Java 1.8.0_65(来自Oracle Corporation) 线程INFO:JVM启动参数:[ - Xmx1g]
因此我的问题是:如何将JVM参数从1GB增加到3 GB?
我的devtools信息是:
Session info --------------------------------------
setting value
version R version 3.4.0 (2017-04-21)
system x86_64, darwin15.6.0
ui RStudio (1.0.143)
language (EN)
collate en_GB.UTF-8
tz Europe/London
date 2017-07-13`
package * version date
base * 3.4.0 2017-04-21
caret * 6.0-76 2017-04-18
datasets * 3.4.0 2017-04-21
dplyr * 0.7.1 2017-06-22
ggplot2 * 2.2.1 2016-12-30
graphics * 3.4.0 2017-04-21
grDevices * 3.4.0 2017-04-21
h2o * 3.10.5.2 2017-07-01
lattice * 0.20-35 2017-03-25
methods * 3.4.0 2017-04-21
rsparkling * 0.2.1 2017-06-30
sparklyr * 0.5.6-9011 2017-07-05
stats * 3.4.0 2017-04-21
utils * 3.4.0 2017-04-21`
谢谢你, MJ