EMR hadoop(MRv2)群集的容量最大为80%。如何获得剩余的20%?

时间:2015-01-07 22:30:25

标签: hadoop yarn emr

我在AWS上使用Elastic MapReduce(带有YARN的Hadoop 2.0)。

配置如下:

10 x g2.2xlarge core instances with 15GB of RAM and 8 CPU cores
yarn.nodemanager.vmem-check-enabled=false 
yarn.scheduler.minimum-allocation-mb=2048
yarn.nodemanager.resource.memory-mb=12288
mapreduce.map.memory.mb=3072

运行作业时,调度程序显示只分配了81.7%的集群:

Used Capacity:  81.7%
Absolute Used Capacity: 81.7%
Absolute Capacity:  100.0%
Absolute Max Capacity:  100.0%
Used Resources: 
Num Schedulable Applications:   1
Num Non-Schedulable Applications:   0
Num Containers:  25
Max Applications:   10000
Max Applications Per User:  10000
Max Schedulable Applications:   6
Max Schedulable Applications Per User:  6
Configured Capacity:    100.0%
Configured Max Capacity:    100.0%
Configured Minimum User Limit Percent:  100%
Configured User Limit Factor:   1.0
Active users:   hadoop 

调度程序为每个节点分配最多3个容器,容器总数上限为25.

为什么它只分配25个容器?<​​/ strong>

从我希望看到的内存设置

yarn.nodemanager.resource.memory-mb(12288) / mapreduce.map.memory.mb(3072) = 4 containers per node

由于

P.S。这看起来像一个类似的问题,但它没有得到回答 How concurrent # mappers and # reducers are calculated in Hadoop 2 + YARN?

1 个答案:

答案 0 :(得分:2)

经过this tutorial后,我开始工作了。

改变了两件事:

  1. mapreduce.map.memory.mb有拼写错误
  2. mapreduce.map.java.opts,默认情况下设置得太低
  3. 对我有用的最终设置是:

    yarn.nodemanager.vmem-pmem-ratio=50
    yarn.nodemanager.resource.memory-mb=12288
    yarn.scheduler.minimum-allocation-mb=3057
    yarn.app.mapreduce.am.resource.mb=6114
    mapreduce.map.java.opts: -Xmx2751m
    mapreduce.map.memory.mb: 3057
    

    现在它为每个节点完全分配了4个容器。