h2o模型不适合驱动程序节点的内存错误

时间:2017-04-19 03:40:59

标签: model h2o gbm

我在H2O中通过R代码运行GBM模型并得到以下错误。相同的代码运行了几个星期。想知道这是否是H2O端错误或用户系统上的配置?

water.exceptions.H2OModelBuilderIllegalArgumentException:GBM模型的非法参数:gbm-2017-04-18-15-29-53。详细信息:字段上的ERRR:_ntrees:树模型不适合驱动程序节点的内存(每棵树23.2 MB x 1000> 3.32 GB) - 尝试减少ntree和/或max_depth或增加min_rows!

2 个答案:

答案 0 :(得分:3)

对我有用的修复是在初始化H2O时设置两者最小和最大内存大小。例如:

未指定最小或最大内存大小时失败:

localH2O <- h2o.init(ip='localhost', nthreads=-1)

INFO: Java heap totalMemory: 1.92 GB
INFO: Java heap maxMemory: 26.67 GB
INFO: Java version: Java 1.8.0_121 (from Oracle Corporation)
INFO: JVM launch parameters: [-ea]
INFO: OS version: Linux 3.10.0-327.el7.x86_64 (amd64)
INFO: Machine physical memory: 1.476 TB

仅指定最大内存大小时失败:

localH2O <- h2o.init(ip='localhost', nthreads=-1,
                     max_mem_size='200G')

INFO: Java availableProcessors: 64
INFO: Java heap totalMemory: 1.92 GB
INFO: Java heap maxMemory: 177.78 GB
INFO: Java version: Java 1.8.0_121 (from Oracle Corporation)
INFO: JVM launch parameters: [-Xmx200G, -ea]
INFO: OS version: Linux 3.10.0-327.el7.x86_64 (amd64)
INFO: Machine physical memory: 1.476 TB

指定两个最小和最大内存大小时,这是成功的:

localH2O <- h2o.init(ip='localhost', nthreads=-1,
                     min_mem_size='100G', max_mem_size='200G')

INFO: Java availableProcessors: 64
INFO: Java heap totalMemory: 95.83 GB
INFO: Java heap maxMemory: 177.78 GB
INFO: Java version: Java 1.8.0_121 (from Oracle Corporation)
INFO: JVM launch parameters: [-Xms100G, -Xmx200G, -ea]
INFO: OS version: Linux 3.10.0-327.el7.x86_64 (amd64)
INFO: Machine physical memory: 1.476 TB

答案 1 :(得分:2)

您帖子中的3.32 GB号码是根据H2O作业中的活动计算出的数字。因此,在不知道工作中发生了什么的情况下直接验证它是很困难的。每个节点40 GB与3.32 GB完全不同,因此请执行以下操作以检查作业...

在H2O Hadoop作业完成后,您可以查看YARN日志以确认容器确实获得了您期望的内存量。

使用以下命令(运行完成后由h2odriver输出为您打印):

yarn logs -applicationId application_nnn_nnn

对我来说,其中一个H2O节点容器的(轻微修剪)输出如下所示:

Container: container_e20_1487032509333_2085_01_000004 on mr-0xd4.0xdata.loc_45454
===================================================================================
LogType:stderr
Log Upload Time:Sat Apr 22 07:58:13 -0700 2017
...

LogType:stdout
Log Upload Time:Sat Apr 22 07:58:13 -0700 2017
LogLength:7517
Log Contents:
POST 0: Entered run
POST 11: After setEmbeddedH2OConfig
04-22 07:57:56.979 172.16.2.184:54323    11976  main      INFO: ----- H2O started  -----
04-22 07:57:57.011 172.16.2.184:54323    11976  main      INFO: Build git branch: rel-turing
04-22 07:57:57.011 172.16.2.184:54323    11976  main      INFO: Build git hash: 34b83da423d26dfbcc0b35c72714b31e80101d49
04-22 07:57:57.011 172.16.2.184:54323    11976  main      INFO: Build git describe: jenkins-rel-turing-8
04-22 07:57:57.011 172.16.2.184:54323    11976  main      INFO: Build project version: 3.10.0.8 (latest version: 3.10.4.5)
04-22 07:57:57.011 172.16.2.184:54323    11976  main      INFO: Build age: 6 months and 11 days
04-22 07:57:57.012 172.16.2.184:54323    11976  main      INFO: Built by: 'jenkins'
04-22 07:57:57.012 172.16.2.184:54323    11976  main      INFO: Built on: '2016-10-10 13:45:37'
04-22 07:57:57.012 172.16.2.184:54323    11976  main      INFO: Java availableProcessors: 32
04-22 07:57:57.012 172.16.2.184:54323    11976  main      INFO: Java heap totalMemory: 9.86 GB
04-22 07:57:57.012 172.16.2.184:54323    11976  main      INFO: Java heap maxMemory: 9.86 GB
04-22 07:57:57.012 172.16.2.184:54323    11976  main      INFO: Java version: Java 1.7.0_67 (from Oracle Corporation)

请注意,应用程序主容器日志输出看起来不同,因此只需找到任何一个H2O节点容器的输出。

查找&#34; Java heap maxMemory&#34;行。在我的情况下,我要求&#39; -mapperXmx 10g&#39;在命令行上,这看起来不错。 9.86 GB接近&#39; 10g&#39;给出一点JVM开销。

如果它不符合您的预期,则会出现Hadoop配置问题:某些Hadoop设置会覆盖您在命令行上请求的内存量。