我正在尝试使用以下代码在hadoop中链接两个连续的M / R作业。基本上,在第一个作业完成后,我创建了另一个使用第一个作业输出作为输入的作业。但是代码不会为第二个作业生成输出,并且它不会抛出任何异常。你能帮我看看可能出错的地方吗?我很感激。
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 3) {
System.err.println("Usage: jobStats <in> <out> <job>");
System.exit(2);
}
conf.set("job", otherArgs[2]);
Job job = new Job(conf, "job count");
job.setJarByClass(jobStats.class);
job.setMapperClass(jobMapper.class);
job.setCombinerClass(jobReducer.class);
job.setReducerClass(jobReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
boolean completionStatus1 = job.waitForCompletion(true);
if (completionStatus1 == true)
{
Job job2 = new Job(conf, "job year ranking");
job2.setJarByClass(jobStats.class);
job2.setPartitionerClass(ChainedPartitioner.class);
job2.setGroupingComparatorClass(CompKeyGroupingComparator.class);
job2.setSortComparatorClass(CompKeyComparator.class);
job2.setMapperClass(ChainedMapper.class);
job2.setReducerClass(ChainedReducer.class);
job2.setPartitionerClass(ChainedPartitioner.class);
job2.setMapOutputKeyClass(CompositeKey.class);
job2.setMapOutputValueClass(IntWritable.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(IntWritable.class);
Path outPath = new Path(otherArgs[1] + "part-r-00000"); // this is the hard-coded output of first job
FileSystem fs = FileSystem.get(conf);
if (fs.exists(outPath))
{
FileInputFormat.addInputPath(job2, outPath);
FileOutputFormat.setOutputPath(job2, new Path("/user/tony/output/today"));
boolean completionStatus2 = job2.waitForCompletion(true);
if (completionStatus2 == true)
{
fs.delete(outPath, true);
System.exit(0);
}
else System.exit(1);
}
else System.exit(1);
}
}
答案 0 :(得分:0)
ChainedMapper和ChainedReducer类用于在单个Map Reduce作业中将多个映射器串在一起。像M1-M2-M3-R-M4-M5这样的东西。
在您的情况下,您希望连续运行两个完整的地图缩减作业。只需为第二份工作指定一张真实的地图。