在hadoop mapreduce作业中没有调用Reducer

时间:2016-03-28 14:27:58

标签: java hadoop mapreduce

我有两个mapper类,它们只是创建键值对,我的主要逻辑应该在reducer部分。我正在尝试比较来自两个不同文本文件的数据。
我的mapper类是

public static class Map extends
        Mapper<LongWritable, Text, Text, Text> {

    private String ky,vl="a";

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            String line = value.toString();
            String tokens[] = line.split("\t");
            vl = tokens[1].trim();
            ky = tokens[2].trim();
    //sending key-value pairs to the reducer
            context.write(new Text(ky),new Text(vl));
    }
}

我的第二个映射器是

public static class Map2 extends
        Mapper<LongWritable, Text, Text, Text> {

    private String ky2,vl2 = "a";
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
         String line = value.toString();
            String tokens[] = line.split("\t");
            vl2 = tokens[1].trim();
            ky2 = tokens[2].trim();
    //sending key-value pairs to the reducer
            context.write(new Text(ky2),new Text(vl2));

    }
}

Reducer类是

public static class Reduce extends
        Reducer<Text, Text, Text, Text> {

  private String rslt = "l";

    public void reduce(Text key, Iterator<Text> values,Context context) throws IOException, InterruptedException {
          int count = 0;
            while(values.hasNext()){
            count++;
            }
            rslt = Integer.toString(count);
       if(count>1){    
            context.write(key,new Text(rslt));
          }
    }

}

我的主要方法是

Configuration conf = new Configuration();
    Job job = new Job(conf);
    job.setJarByClass(CompareTwoFiles.class);
    job.setJobName("Compare Two Files and Identify the Difference");
    FileOutputFormat.setOutputPath(job, new Path(args[2]));
    job.setReducerClass(Reduce.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    MultipleInputs.addInputPath(job, new Path(args[0]),
            TextInputFormat.class, Map.class);
    MultipleInputs.addInputPath(job, new Path(args[1]),
            TextInputFormat.class, Map2.class);
    job.waitForCompletion(true);

输出

    File System Counters
    FILE: Number of bytes read=361621  
 FILE: Number of bytes written=1501806
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=552085
    HDFS: Number of bytes written=150962
    HDFS: Number of read operations=28
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=5
Map-Reduce Framework
    Map input records=10783
    Map output records=10783
    Map output bytes=150962
    Map output materialized bytes=172540
    Input split bytes=507
    Combine input records=0
    Combine output records=0
    Reduce input groups=7985
    Reduce shuffle bytes=172540
    Reduce input records=10783
    Reduce output records=10783
    Spilled Records=21566
    Shuffled Maps =2
    Failed Shuffles=0
    Merged Map outputs=2
    GC time elapsed (ms)=12
    Total committed heap usage (bytes)=928514048
Shuffle Errors
    BAD_ID=0
    CONNECTION=0
    IO_ERROR=0
    WRONG_LENGTH=0
    WRONG_MAP=0
    WRONG_REDUCE=0
File Input Format Counters 
    Bytes Read=0
File Output Format Counters 
    Bytes Written=150962

0 个答案:

没有答案