假设以下场景:一组依赖作业,发送到hadoop。 Hadoop执行第一个,然后是第二个依赖于第一个,等等。使用JobControl
一次性提交作业(参见下面的代码)。
使用Hadoop 2.x(在Java中),是否可以在运行时更改作业的reducer数量?更具体地说,如何在作业1执行后更改作业2中的减速器数量?
另外,有没有办法让hadoop通过估算地图输出自动推断减速器的数量?它总是需要1,我找不到改变默认设置的方法(除了我自己明确设置数字)。
// 1. create JobControl
JobControl jc = new JobControl(name);
// 2. add all the controlled jobs to the job control
// note that this is done in one go by using a collection
jc.addJobCollection(jobs);
// 3. execute the jobcontrol in a Thread
Thread workflowThread = new Thread(jc, "Thread_" + name);
workflowThread.setDaemon(true); // will not avoid JVM to shutdown
// 4. we wait for it to complete
LOG.info("Waiting for thread to complete: " + workflowThread.getName());
while (!jc.allFinished()) {
Thread.sleep(REFRESH_WAIT);
}
答案 0 :(得分:1)
你的第一个问题。是的,您可以在驱动程序中执行作业1后设置作业2的缩减器数量:
Job job1 = new Job(conf, "job 1");
//your job setup here
//...
job1.submit();
job1.waitForCompletion(true);
int job2Reducers = ... //compute based on job1 results here
Job job2 = new Job(conf, "job 2");
job2.setNumReduceTasks(job2Reducers);
//your job2 setup here
//...
job2.submit();
job2.waitForCompletion(true);
第二个问题,据我所知,不,你不能让Hadoop根据你的mapper加载自动选择reducers的数量。
答案 1 :(得分:0)
映射数通常由输入文件中的DFS块数驱动。虽然这会导致人们调整他们的DFS块大小来调整地图的数量。
因此我们可以使用与map相同的逻辑设置reducer任务的数量。 为了使reducer动态化,我编写了逻辑来动态设置reducer任务的数量,以便在运行时调整map任务的数量。
在Java代码中:
long defaultBlockSize = 0;
int NumOfReduce = 10; // default value you can give any number.
long inputFileLength = 0;
try {
FileSystem fileSystem = FileSystem.get(this.getConf()); // hdfs file system
inputFileLength = fileSystem.getContentSummary(
new Path(PROP_HDFS_INPUT_LOCATION)).getLength();// input files stored in hdfs location
defaultBlockSize = fileSystem.getDefaultBlockSize(new Path(
hdfsFilePath.concat("PROP_HDFS_INPUT_LOCATION")));// getting default block size
if (inputFileLength > 0 && defaultBlockSize > 0) {
NumOfReduce = (int) (((inputFileLength / defaultBlockSize) + 1) * 2);// calculating number of tasks
}
System.out.println("NumOfReduce : " + NumOfReduce);
} catch (Exception e) {
LOGGER.error(" Exception{} ", e);
}
job.setNumReduceTasks(NumOfReduce);