Hadoop:在运行时更改reducer的数量

时间:2015-01-16 14:40:44

标签: java hadoop

假设以下场景:一组依赖作业,发送到hadoop。 Hadoop执行第一个,然后是第二个依赖于第一个,等等。使用JobControl一次性提交作业(参见下面的代码)。

使用Hadoop 2.x(在Java中),是否可以在运行时更改作业的reducer数量?更具体地说,如何在作业1执行后更改作业2中的减速器数量?

另外,有没有办法让hadoop通过估算地图输出自动推断减速器的数量?它总是需要1,我找不到改变默认设置的方法(除了我自己明确设置数字)。

// 1. create JobControl
JobControl jc = new JobControl(name);

// 2. add all the controlled jobs to the job control
// note that this is done in one go by using a collection
jc.addJobCollection(jobs);

// 3. execute the jobcontrol in a Thread
Thread workflowThread = new Thread(jc, "Thread_" + name);
workflowThread.setDaemon(true); // will not avoid JVM to shutdown

// 4. we wait for it to complete
LOG.info("Waiting for thread to complete: " + workflowThread.getName());
while (!jc.allFinished()) {
    Thread.sleep(REFRESH_WAIT);
}

2 个答案:

答案 0 :(得分:1)

你的第一个问题。是的,您可以在驱动程序中执行作业1后设置作业2的缩减器数量:

Job job1 = new Job(conf, "job 1");
//your job setup here
//...
job1.submit();
job1.waitForCompletion(true);

int job2Reducers = ... //compute based on job1 results here

Job job2 = new Job(conf, "job 2");
job2.setNumReduceTasks(job2Reducers);
//your job2 setup here
//...
job2.submit();
job2.waitForCompletion(true);

第二个问题,据我所知,不,你不能让Hadoop根据你的mapper加载自动选择reducers的数量。

答案 1 :(得分:0)

映射数通常由输入文件中的DFS块数驱动。虽然这会导致人们调整他们的DFS块大小来调整地图的数量。

因此我们可以使用与map相同的逻辑设置reducer任务的数量。 为了使reducer动态化,我编写了逻辑来动态设置reducer任务的数量,以便在运行时调整map任务的数量。

在Java代码中:

    long defaultBlockSize = 0;
    int NumOfReduce = 10; // default value you can give any number.
    long inputFileLength = 0;
    try {
        FileSystem fileSystem = FileSystem.get(this.getConf()); // hdfs file system
        inputFileLength = fileSystem.getContentSummary(
                new Path(PROP_HDFS_INPUT_LOCATION)).getLength();// input files stored in hdfs location

        defaultBlockSize = fileSystem.getDefaultBlockSize(new Path(
                hdfsFilePath.concat("PROP_HDFS_INPUT_LOCATION")));// getting default block size

        if (inputFileLength > 0 && defaultBlockSize > 0) {
            NumOfReduce = (int) (((inputFileLength / defaultBlockSize) + 1) * 2);// calculating number of tasks
        }
        System.out.println("NumOfReduce : " + NumOfReduce);
    } catch (Exception e) {
        LOGGER.error(" Exception{} ", e);
    }

    job.setNumReduceTasks(NumOfReduce);