Question

我正在尝试使用以下代码在hadoop中链接两个连续的M / R作业。基本上，在第一个作业完成后，我创建了另一个使用第一个作业输出作为输入的作业。但是代码不会为第二个作业生成输出，并且它不会抛出任何异常。你能帮我看看可能出错的地方吗？我很感激。

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args)
            .getRemainingArgs();
    if (otherArgs.length != 3) {
        System.err.println("Usage: jobStats <in> <out> <job>");
        System.exit(2);
    }


    conf.set("job", otherArgs[2]);
    Job job = new Job(conf, "job count");
    job.setJarByClass(jobStats.class);
    job.setMapperClass(jobMapper.class);
    job.setCombinerClass(jobReducer.class);
    job.setReducerClass(jobReducer.class);

    job.setMapOutputKeyClass(Text.class);        
    job.setMapOutputValueClass(IntWritable.class);           
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

    boolean completionStatus1 = job.waitForCompletion(true);
    if (completionStatus1 == true)
    {
        Job job2 = new Job(conf, "job year ranking");
        job2.setJarByClass(jobStats.class);
        job2.setPartitionerClass(ChainedPartitioner.class);
        job2.setGroupingComparatorClass(CompKeyGroupingComparator.class);
        job2.setSortComparatorClass(CompKeyComparator.class);

        job2.setMapperClass(ChainedMapper.class);
        job2.setReducerClass(ChainedReducer.class);
        job2.setPartitionerClass(ChainedPartitioner.class);
        job2.setMapOutputKeyClass(CompositeKey.class);
        job2.setMapOutputValueClass(IntWritable.class);
        job2.setOutputKeyClass(Text.class);
        job2.setOutputValueClass(IntWritable.class);

        Path outPath = new Path(otherArgs[1] + "part-r-00000"); // this is the hard-coded output of first job
        FileSystem fs = FileSystem.get(conf);
        if (fs.exists(outPath))
        {
            FileInputFormat.addInputPath(job2, outPath);
            FileOutputFormat.setOutputPath(job2, new Path("/user/tony/output/today"));

            boolean completionStatus2 = job2.waitForCompletion(true);
            if (completionStatus2 == true)
            {
                fs.delete(outPath, true);
                System.exit(0);
            }
            else System.exit(1);
        }
        else System.exit(1);
    }
}

Answer 1

ChainedMapper和ChainedReducer类用于在单个Map Reduce作业中将多个映射器串在一起。像M1-M2-M3-R-M4-M5这样的东西。

在您的情况下，您希望连续运行两个完整的地图缩减作业。只需为第二份工作指定一张真实的地图。

Hadoop M / R作业链接无异常

1 个答案: