我在H2O中通过R代码运行GBM模型并得到以下错误。相同的代码运行了几个星期。想知道这是否是H2O端错误或用户系统上的配置?
water.exceptions.H2OModelBuilderIllegalArgumentException:GBM模型的非法参数:gbm-2017-04-18-15-29-53。详细信息:字段上的ERRR:_ntrees:树模型不适合驱动程序节点的内存(每棵树23.2 MB x 1000> 3.32 GB) - 尝试减少ntree和/或max_depth或增加min_rows!
答案 0 :(得分:3)
对我有用的修复是在初始化H2O时设置两者最小和最大内存大小。例如:
未指定最小或最大内存大小时失败:
localH2O <- h2o.init(ip='localhost', nthreads=-1)
INFO: Java heap totalMemory: 1.92 GB
INFO: Java heap maxMemory: 26.67 GB
INFO: Java version: Java 1.8.0_121 (from Oracle Corporation)
INFO: JVM launch parameters: [-ea]
INFO: OS version: Linux 3.10.0-327.el7.x86_64 (amd64)
INFO: Machine physical memory: 1.476 TB
仅指定最大内存大小时失败:
localH2O <- h2o.init(ip='localhost', nthreads=-1,
max_mem_size='200G')
INFO: Java availableProcessors: 64
INFO: Java heap totalMemory: 1.92 GB
INFO: Java heap maxMemory: 177.78 GB
INFO: Java version: Java 1.8.0_121 (from Oracle Corporation)
INFO: JVM launch parameters: [-Xmx200G, -ea]
INFO: OS version: Linux 3.10.0-327.el7.x86_64 (amd64)
INFO: Machine physical memory: 1.476 TB
指定两个最小和最大内存大小时,这是成功的:
localH2O <- h2o.init(ip='localhost', nthreads=-1,
min_mem_size='100G', max_mem_size='200G')
INFO: Java availableProcessors: 64
INFO: Java heap totalMemory: 95.83 GB
INFO: Java heap maxMemory: 177.78 GB
INFO: Java version: Java 1.8.0_121 (from Oracle Corporation)
INFO: JVM launch parameters: [-Xms100G, -Xmx200G, -ea]
INFO: OS version: Linux 3.10.0-327.el7.x86_64 (amd64)
INFO: Machine physical memory: 1.476 TB
答案 1 :(得分:2)
您帖子中的3.32 GB号码是根据H2O作业中的活动计算出的数字。因此,在不知道工作中发生了什么的情况下直接验证它是很困难的。每个节点40 GB与3.32 GB完全不同,因此请执行以下操作以检查作业...
在H2O Hadoop作业完成后,您可以查看YARN日志以确认容器确实获得了您期望的内存量。
使用以下命令(运行完成后由h2odriver输出为您打印):
yarn logs -applicationId application_nnn_nnn
对我来说,其中一个H2O节点容器的(轻微修剪)输出如下所示:
Container: container_e20_1487032509333_2085_01_000004 on mr-0xd4.0xdata.loc_45454
===================================================================================
LogType:stderr
Log Upload Time:Sat Apr 22 07:58:13 -0700 2017
...
LogType:stdout
Log Upload Time:Sat Apr 22 07:58:13 -0700 2017
LogLength:7517
Log Contents:
POST 0: Entered run
POST 11: After setEmbeddedH2OConfig
04-22 07:57:56.979 172.16.2.184:54323 11976 main INFO: ----- H2O started -----
04-22 07:57:57.011 172.16.2.184:54323 11976 main INFO: Build git branch: rel-turing
04-22 07:57:57.011 172.16.2.184:54323 11976 main INFO: Build git hash: 34b83da423d26dfbcc0b35c72714b31e80101d49
04-22 07:57:57.011 172.16.2.184:54323 11976 main INFO: Build git describe: jenkins-rel-turing-8
04-22 07:57:57.011 172.16.2.184:54323 11976 main INFO: Build project version: 3.10.0.8 (latest version: 3.10.4.5)
04-22 07:57:57.011 172.16.2.184:54323 11976 main INFO: Build age: 6 months and 11 days
04-22 07:57:57.012 172.16.2.184:54323 11976 main INFO: Built by: 'jenkins'
04-22 07:57:57.012 172.16.2.184:54323 11976 main INFO: Built on: '2016-10-10 13:45:37'
04-22 07:57:57.012 172.16.2.184:54323 11976 main INFO: Java availableProcessors: 32
04-22 07:57:57.012 172.16.2.184:54323 11976 main INFO: Java heap totalMemory: 9.86 GB
04-22 07:57:57.012 172.16.2.184:54323 11976 main INFO: Java heap maxMemory: 9.86 GB
04-22 07:57:57.012 172.16.2.184:54323 11976 main INFO: Java version: Java 1.7.0_67 (from Oracle Corporation)
请注意,应用程序主容器日志输出看起来不同,因此只需找到任何一个H2O节点容器的输出。
查找&#34; Java heap maxMemory&#34;行。在我的情况下,我要求&#39; -mapperXmx 10g&#39;在命令行上,这看起来不错。 9.86 GB接近&#39; 10g&#39;给出一点JVM开销。
如果它不符合您的预期,则会出现Hadoop配置问题:某些Hadoop设置会覆盖您在命令行上请求的内存量。