以编程方式限制YARN容器

时间:2017-07-05 11:32:33

标签: java hadoop mapreduce yarn

我在hadoop集群中有10个节点,32GB RAM,一个64GB。

对于这10个节点,节点限制yarn.nodemanager.resource.memory-mb设置为26GB,64GB节点设置为52GB(有一些作业需要50GB用于单个reducer,它们在此节点上运行)

问题是,当我运行需要8GB映射器的基本作业时,32GB节点并行产生3个映射器(26/8 = 3),64GB节点产生6个映射器。由于CPU负载,此节点通常最后完成。

我想以编程方式限制作业容器资源,例如对于大多数作业,将容器限制设置为26GB。怎么做到呢?

2 个答案:

答案 0 :(得分:2)

首先yarn.nodemanager.resource.memory-mb(内存),yarn.nodemanager.resource.cpu-vcores(vcore)是Nodemanager守护程序/服务配置属性,不能在YARN客户端应用程序中覆盖。如果更改这些配置属性,则需要重新启动nodemanager服务。

由于CPU是您的瓶颈,我的建议是将YARN调度策略更改为集群级别的 FEARcheduler with DRF (Dominant Resource Fairness)调度策略,以便您获得灵活性根据内存和cpu核心指定应用程序容器大小。正在运行的应用程序容器数(mapper / reducer / AM / tasks)将基于您定义的可用vcore

可以在Fair调度程序队列/池级别设置调度策略。

schedulingPolicy:设置任何队列的调度策略。允许的值是“fifo”/“fair”/“drf”

有关详细信息,请参阅this apache doc -

一旦使用DRF调度策略创建了新的Fair调度程序队列/池,两个内存都可以在程序中设置cpu core,如下所示。

配置conf = new Configuration();

如何在mapreduce应用程序中定义容器大小。

Configuration conf = new Configuration();

conf.set("mapreduce.map.memory.mb","4096");
conf.set(mapreduce.reduce.memory.mb","4096");

conf.set(mapreduce.map.cpu.vcores","1");
conf.set(mapreduce.reduce.cpu.vcores","1");

参考 - https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

mapper / reducer的cpu.vcores分配的默认值为1,如果它是cpu密集型应用程序,则可以增加此值。 记住如果增加此值,则并行运行的映射器/减速器任务的数量也将减少。

答案 1 :(得分:0)

您必须像这样设置配置。试试这个

// create a configuration
Configuration conf = new Configuration();
// create a new job based on the configuration
Job job = new Job(conf);
// here you have to put your mapper class
job.setMapperClass(Mapper.class);
// here you have to put your reducer class
job.setReducerClass(Reducer.class);
// here you have to set the jar which is containing your 
// map/reduce class, so you can use the mapper class
job.setJarByClass(Mapper.class);
// key/value of your reducer output
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
// this is setting the format of your input, can be TextInputFormat
job.setInputFormatClass(SequenceFileInputFormat.class);
// same with output
job.setOutputFormatClass(TextOutputFormat.class);
// here you can set the path of your input
SequenceFileInputFormat.addInputPath(job, new Path("files/toMap/"));
// this deletes possible output paths to prevent job failures
FileSystem fs = FileSystem.get(conf);
Path out = new Path("files/out/processed/");
fs.delete(out, true);
// finally set the empty out path
TextOutputFormat.setOutputPath(job, out);

// this waits until the job completes and prints debug out to STDOUT or whatever
// has been configured in your log4j properties.
job.waitForCompletion(true); 

对于YARN,需要设置以下配置。

// this should be like defined in your yarn-site.xml
conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001"); 

//For set to 26GB
conf.set("yarn.nodemanager.resource.memory-mb", "26624"); 


// framework is now "yarn", should be defined like this in mapred-site.xm
conf.set("mapreduce.framework.name", "yarn");

// like defined in hdfs-site.xml
conf.set("fs.default.name", "hdfs://namenode.com:9000");