我在AWS上使用Elastic MapReduce(带有YARN的Hadoop 2.0)。
配置如下:
10 x g2.2xlarge core instances with 15GB of RAM and 8 CPU cores
yarn.nodemanager.vmem-check-enabled=false
yarn.scheduler.minimum-allocation-mb=2048
yarn.nodemanager.resource.memory-mb=12288
mapreduce.map.memory.mb=3072
运行作业时,调度程序显示只分配了81.7%的集群:
Used Capacity: 81.7% Absolute Used Capacity: 81.7% Absolute Capacity: 100.0% Absolute Max Capacity: 100.0% Used Resources: Num Schedulable Applications: 1 Num Non-Schedulable Applications: 0 Num Containers: 25 Max Applications: 10000 Max Applications Per User: 10000 Max Schedulable Applications: 6 Max Schedulable Applications Per User: 6 Configured Capacity: 100.0% Configured Max Capacity: 100.0% Configured Minimum User Limit Percent: 100% Configured User Limit Factor: 1.0 Active users: hadoop
调度程序为每个节点分配最多3个容器,容器总数上限为25.
为什么它只分配25个容器?</ strong>
从我希望看到的内存设置
yarn.nodemanager.resource.memory-mb(12288) / mapreduce.map.memory.mb(3072) = 4 containers per node
由于
P.S。这看起来像一个类似的问题,但它没有得到回答 How concurrent # mappers and # reducers are calculated in Hadoop 2 + YARN?
答案 0 :(得分:2)
经过this tutorial后,我开始工作了。
改变了两件事:
对我有用的最终设置是:
yarn.nodemanager.vmem-pmem-ratio=50
yarn.nodemanager.resource.memory-mb=12288
yarn.scheduler.minimum-allocation-mb=3057
yarn.app.mapreduce.am.resource.mb=6114
mapreduce.map.java.opts: -Xmx2751m
mapreduce.map.memory.mb: 3057
现在它为每个节点完全分配了4个容器。