应用错误收集

当链接两个作业时，Hadoop第二个减速器没有被调用

时间：2016-05-02 07:41:26

标签： java hadoop mapreduce

我有一个hadoop程序，我希望链接两个作业，例如输入 - ＆gt; mapper1 - ＆gt; reducer1 - ＆gt; mapper2 - ＆gt; reducer2 - ＆gt;输出。上半部分工作正常，我得到了正确的中间输出。问题在于第二份工作。特别是，我相信在第二项工作中，由于某种原因导致类型不匹配，因此映射器不会出于某种原因调用正确的reducer。以下是我设置工作的主要代码：

    //JOB 1
    Path input1 = new Path(otherArgs.get(0));
    Path output1 =new Path("/tempBinaryPath");

    Job job1 = Job.getInstance(conf);
        job1.setJarByClass(BinaryPathRefined.class);
        job1.setJobName("BinaryPathR1");

    FileInputFormat.addInputPath(job1, input1);
    FileOutputFormat.setOutputPath(job1, output1);

    job1.setMapperClass(MyMapper.class);
    //job.setCombinerClass(MyReducer.class);
    job1.setReducerClass(MyReducer.class);

    job1.setInputFormatClass(TextInputFormat.class);

    job1.setOutputKeyClass(Text.class);
    job1.setOutputValueClass(Text.class);

    job1.waitForCompletion(true);


    // JOB 2
    Path input2 = new Path("/tempBinaryPath/part-r-00000");
    Path output2 =new Path(otherArgs.get(1));

    Job job2 = Job.getInstance(conf2);
        job2.setJarByClass(BinaryPathRefined.class);
        job2.setJobName("BinaryPathR2");

    FileInputFormat.addInputPath(job2, input2);
    FileOutputFormat.setOutputPath(job2, output2);

    job2.setMapperClass(MyMapper2.class);
    //job.setCombinerClass(MyReducer.class);
    job2.setReducerClass(MyReducer2.class);

    job2.setInputFormatClass(TextInputFormat.class);

    job2.setOutputKeyClass(Text.class);
    job2.setOutputValueClass(Text.class);

    job2.waitForCompletion(true);

映射器和缩减器的形式如下：

public static class MyMapper extends Mapper<LongWritable, Text, Text, Text>{
...
}

public static class MyReducer extends Reducer<Text, Text, Text, Text>{
...
}

public static class MyMapper2 extends Mapper<LongWritable, Text, Text, IntWritable>{
...
}

public static class MyReducer2 extends Reducer<Text, IntWritable, Text, Text>{
...
}

第一份工作运行正常，而在第二份工作中我得到错误：

Type mismatch in value from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable

有什么想法吗？

1 个答案:

答案 0 :(得分：3)

当您只调用setOutputKeyClass和setOutputValueClass时，Hadoop会假设Mapper和Reducer都具有相同的输出类型。在您的情况下，您应该明确地设置Mapper生成的输出类型：

job2.setOutputKeyClass(Text.class);
job2.setMapOutputValueClass(IntWritable.class);