Question

您好我正在尝试实现java hadoop应用程序。我想制作＆lt; Object，Text，NaicsAreaPair，LongWritable＆gt;的映射器。（因此mapper的输出将是NaicsAreaPair作为键，LongWritable作为值）。然后我需要Combiner像＆lt; NaicsAreaPair，LongWritable，Text，AreaWagePair＆gt;所以输入与mapper输出是正确的，但组合器输出与mapper输出不同。

我在主类中有这样的配置：

public static void main(String[] args) throws Exception {
 Configuration conf = new Configuration();
 Job job = Job.getInstance(conf, "NY statistics");
 job.setJarByClass(NYStatisticsOwnWritableComparable.class);
 job.setMapperClass(TokenizerMapper.class);
 job.setCombinerClass(Combiner.class);
 job.setReducerClass(IntSumReducer.class);
 job.setOutputKeyClass(NaicsAreaPair.class);
 job.setOutputValueClass(LongWritable.class);
 //job.setPartitionerClass(Rozdelovac.class);
 FileInputFormat.addInputPath(job, new Path(args[0]));
 FileOutputFormat.setOutputPath(job, new Path(args[1]));
 //job.setNumReduceTasks(3);
 System.exit(job.waitForCompletion(true) ? 0 : 1);
}

这里我要说明将使用哪个输出键和输出值。有没有可能设置它就像mapper使用这个输出键和值，但是对于合并器使用不同？

非常感谢您的回答

Answer 1

不是。组合器输出必须与Mapper输出相同。

Answer 2

为什么要使用合成器呢？组合器的目的是“性能”和＃39;通过减少通过网络发送的数据。有一些限制，例如输入/输出类型必须与映射器输出（键/值）类型/减速器输入（键/值）类型匹配，它执行的功能应该是关联的和可交换的，请参见此处的示例http://www.philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/

你想要的组合器，让它成为减速器

hadoop中用于映射器和组合器的不同上下文类型

2 个答案: