Question

我已经设置了一个Hadoop工作：

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();

    Job job = Job.getInstance(conf, "Legion");
    job.setJarByClass(Legion.class);

    job.setMapperClass(CallQualityMap.class);
    job.setReducerClass(CallQualityReduce.class);

    // Explicitly configure map and reduce outputs, since they're different classes
    job.setMapOutputKeyClass(CallSampleKey.class);
    job.setMapOutputValueClass(CallSample.class);
    job.setOutputKeyClass(NullWritable.class);
    job.setOutputValueClass(Text.class);

    job.setInputFormatClass(CombineRepublicInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    CombineRepublicInputFormat.setMaxInputSplitSize(job, 128000000);
    CombineRepublicInputFormat.setInputDirRecursive(job, true);
    CombineRepublicInputFormat.addInputPath(job, new Path(args[0]));

    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.waitForCompletion(true);
}

这项工作完成了，但发生了一些奇怪的事情。每条输入线我有一条输出线。每个输出行包含CallSampleKey.toString()方法的输出，然后是选项卡，然后是CallSample@17ab34d。

这意味着减少阶段永远不会运行，CallSampleKey和CallSample将直接传递给TextOutputFormat。但我不明白为什么会这样。我已经非常清楚地指定了job.setReducerClass(CallQualityReduce.class);，所以我不知道它为什么会跳过减速器！

编辑：这是reducer的代码：

public static class CallQualityReduce extends Reducer<CallSampleKey, CallSample, NullWritable, Text> {

    public void reduce(CallSampleKey inKey, Iterator<CallSample> inValues, Context context) throws IOException, InterruptedException {
        Call call = new Call(inKey.getId().toString(), inKey.getUuid().toString());

        while (inValues.hasNext()) {
            call.addSample(inValues.next());
        }

        context.write(NullWritable.get(), new Text(call.getStats()));
    }
}

Answer 1

如果您尝试更改

，该怎么办？

public void reduce(CallSampleKey inKey, Iterator<CallSample> inValues, Context context) throws IOException, InterruptedException {

使用Iterable代替Iterator？

public void reduce(CallSampleKey inKey, Iterable<CallSample> inValues, Context context) throws IOException, InterruptedException {

然后，您必须使用inValues.iterator()来获取实际的迭代器。

如果方法签名不匹配，那么它只是落到默认的identity reducer implementation。可能不幸的是，底层的默认实现并不容易检测到这种拼写错误，但下一个最好的事情是在你想要覆盖的所有方法中始终使用@Override，以便编译器可以提供帮助。 / p>

Hadoop完全跳过了降阶段

1 个答案: