Question

我在hadoop集群中有10个节点，32GB RAM，一个64GB。

对于这10个节点，节点限制yarn.nodemanager.resource.memory-mb设置为26GB，64GB节点设置为52GB（有一些作业需要50GB用于单个reducer，它们在此节点上运行）

问题是，当我运行需要8GB映射器的基本作业时，32GB节点并行产生3个映射器（26/8 = 3），64GB节点产生6个映射器。由于CPU负载，此节点通常最后完成。

我想以编程方式限制作业容器资源，例如对于大多数作业，将容器限制设置为26GB。怎么做到呢？

Answer 1

首先yarn.nodemanager.resource.memory-mb（内存），yarn.nodemanager.resource.cpu-vcores（vcore）是Nodemanager守护程序/服务配置属性，不能在YARN客户端应用程序中覆盖。如果更改这些配置属性，则需要重新启动nodemanager服务。

由于CPU是您的瓶颈，我的建议是将YARN调度策略更改为集群级别的 FEARcheduler with DRF （Dominant Resource Fairness）调度策略，以便您获得灵活性根据内存和cpu核心指定应用程序容器大小。正在运行的应用程序容器数（mapper / reducer / AM / tasks）将基于您定义的可用vcore

可以在Fair调度程序队列/池级别设置调度策略。

schedulingPolicy：设置任何队列的调度策略。允许的值是“fifo”/“fair”/“drf”

有关详细信息，请参阅this apache doc -

一旦使用DRF调度策略创建了新的Fair调度程序队列/池，两个内存都可以在程序中设置cpu core，如下所示。

配置conf = new Configuration（）;

如何在mapreduce应用程序中定义容器大小。

Configuration conf = new Configuration();

conf.set("mapreduce.map.memory.mb","4096");
conf.set(mapreduce.reduce.memory.mb","4096");

conf.set(mapreduce.map.cpu.vcores","1");
conf.set(mapreduce.reduce.cpu.vcores","1");

参考 - https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

mapper / reducer的cpu.vcores分配的默认值为1，如果它是cpu密集型应用程序，则可以增加此值。记住如果增加此值，则并行运行的映射器/减速器任务的数量也将减少。

Answer 2

您必须像这样设置配置。试试这个

// create a configuration
Configuration conf = new Configuration();
// create a new job based on the configuration
Job job = new Job(conf);
// here you have to put your mapper class
job.setMapperClass(Mapper.class);
// here you have to put your reducer class
job.setReducerClass(Reducer.class);
// here you have to set the jar which is containing your 
// map/reduce class, so you can use the mapper class
job.setJarByClass(Mapper.class);
// key/value of your reducer output
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
// this is setting the format of your input, can be TextInputFormat
job.setInputFormatClass(SequenceFileInputFormat.class);
// same with output
job.setOutputFormatClass(TextOutputFormat.class);
// here you can set the path of your input
SequenceFileInputFormat.addInputPath(job, new Path("files/toMap/"));
// this deletes possible output paths to prevent job failures
FileSystem fs = FileSystem.get(conf);
Path out = new Path("files/out/processed/");
fs.delete(out, true);
// finally set the empty out path
TextOutputFormat.setOutputPath(job, out);

// this waits until the job completes and prints debug out to STDOUT or whatever
// has been configured in your log4j properties.
job.waitForCompletion(true);

对于YARN，需要设置以下配置。

// this should be like defined in your yarn-site.xml
conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001"); 

//For set to 26GB
conf.set("yarn.nodemanager.resource.memory-mb", "26624"); 


// framework is now "yarn", should be defined like this in mapred-site.xm
conf.set("mapreduce.framework.name", "yarn");

// like defined in hdfs-site.xml
conf.set("fs.default.name", "hdfs://namenode.com:9000");

以编程方式限制YARN容器

2 个答案: