hadoop自定义可写未产生预期的输出

时间:2018-09-27 01:16:17

标签: java hadoop mapreduce

我有一组映射器的化简输入:

(1939, [121, 79, 83, 28]) 
(1980, [0, 211, −113])

我想得到如下输出:

1939 max:121 min:28 avg: 77.75

如果我在reducer类中不使用以下自定义可写内容,我可以得到它:

public static class MaxTemperatureReducer
      extends Reducer<Text, IntWritable, Text, Text> {
          Text yearlyValue = new Text();
      @Override
      public void reduce(Text key, Iterable<IntWritable> values,
          Context context)
          throws IOException, InterruptedException {
            int sum = 0;
            int CounterForAvg = 0;
            int minValue = Integer.MAX_VALUE;
            int maxValue = Integer.MIN_VALUE;
            float avg;
            for (IntWritable val : values) {
                int currentValue = val.get();
                sum += currentValue;
                CounterForAvg++;
                minValue = Math.min(minValue, currentValue);
                maxValue = Math.max(maxValue, currentValue);
            }
            avg = sum / CounterForAvg;
            String requiredValue = "max temp:"+maxValue + "\t" +"avg temp: "+ avg + "\t"+ "min temp: " +minValue;
            yearlyValue.set(requiredValue);
            context.write(key, yearlyValue);
      }
    }

但是使用customwritable类会产生以下结果:

1939 121
1939 79
1939 83
1939 28
1980 0
1980 211
1980 -113

这是我实现自定义类和减速器的方式。我将可迭代对象发送到自定义类,并在那里进行了计算。我无法弄清楚我在做什么错。我在Java中有0 exp。

public  class CompositeWritable implements Writable {

         String data = "";

        public CompositeWritable() {

        }

        public CompositeWritable(String data) {
            this.data = data;
        }

        @Override
        public void readFields(DataInput in) throws IOException {
            data = WritableUtils.readString(in);
        }

        @Override
        public void write(DataOutput out) throws IOException {
             WritableUtils.writeString(out, data);
        }

        public void merge(Iterable<IntWritable> values) {
             int sum = 0;
             int CounterForAvg = 0;
             int minValue = Integer.MAX_VALUE;
             int maxValue = Integer.MIN_VALUE;
             float avg;
             for (IntWritable val : values) {
                    int currentValue = val.get();
                    sum += currentValue;
                    CounterForAvg++;
                    minValue = Math.min(minValue, currentValue);
                    maxValue = Math.max(maxValue, currentValue);
                }
             avg = sum / CounterForAvg;
             data = "max temp:"+maxValue + "\t" +"avg temp: "+ avg + "\t"+ "min temp: " +minValue;
        }


        @Override
        public String toString() {
            return data;
        }

    }

public static class MaxTemperatureReducer
      extends Reducer<Text, CompositeWritable,Text, Text> {
            CompositeWritable out;
            Text textYearlyValue = new Text();

      public void reduce(Text key, Iterable<IntWritable> values,
          Context context)
          throws IOException, InterruptedException {
             out.merge(values);
            String requiredOutput = out.toString();
            textYearlyValue.set(requiredOutput);
            context.write(key,textYearlyValue );
      }
    }

我的工作配置如下:

Job job = Job.getInstance(getConf(), "MaxAvgMinTemp");
            job.setJarByClass(this.getClass());

            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));

            job.setMapperClass(MaxTemperatureMapper.class);
            job.setReducerClass(MaxTemperatureReducer.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            return job.waitForCompletion(true) ? 0 : 1;

1 个答案:

答案 0 :(得分:1)

  

不应要求合并帮助我确定值

可以,但是您没有正确使用它。 out从未初始化。

  CompositeWritable out; // null here
  Text textYearlyValue = new Text();

  public void reduce(Text key, Iterable<IntWritable> values,
      Context context)
      throws IOException, InterruptedException {
         out.merge(values); // still null, should throw an exception

如果要输出字符串的单行,则可以只使用Text对象。您的merge(Iterable<IntWritable> values)方法可以在任何地方使用,它不必在完全独立的类中即可返回可写对象。


但是无论如何,如果练习是学习如何实现自定义可写内容,那么您就可以开始了。

注意事项:

  1. 如果您要“撰写”多个字段,则应声明它们
  2. readFieldswrite的顺序必须相同
  3. toString确定使用TextOutputFormat(默认值)时在减速器输出中看到的内容
  4. equalshashCode是为了完整性而添加的(理想情况下,您会实现WritableComparable,但这实际上仅对键很重要,而对值的影响不大)
  5. 为了与其他可写对象相似,我将您的merge方法重命名为set

您可以期望下面的输出看起来像

1939    MinMaxAvgWritable{min=28, max=121, avg=77.75}
1980    MinMaxAvgWritable{min=-113, max=211, avg=32.67}

public class MinMaxAvgWritable implements Writable {

    private int min, max;
    private double avg;

    private DecimalFormat df = new DecimalFormat("#.00");

    @Override
    public String toString() {
        return "MinMaxAvgWritable{" +
                "min=" + min +
                ", max=" + max +
                ", avg=" + df.format(avg) +
                '}';
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        MinMaxAvgWritable that = (MinMaxAvgWritable) o;
        return min == that.min &&
                max == that.max &&
                avg == that.avg;
    }

    @Override
    public int hashCode() {
        return Objects.hash(min, max, avg);
    }

    @Override
    public void write(DataOutput dataOutput) throws IOException {
        dataOutput.writeInt(min);
        dataOutput.writeInt(max);
        dataOutput.writeDouble(avg);
    }

    @Override
    public void readFields(DataInput dataInput) throws IOException {
        this.min = dataInput.readInt();
        this.max = dataInput.readInt();
        this.avg = dataInput.readDouble();
    }

    public void set(int min, int max, double avg) {
        this.min = min;
        this.max = max;
        this.avg = avg;
    }

    public void set(Iterable<IntWritable> values) {
        this.min = Integer.MAX_VALUE;
        this.max = Integer.MIN_VALUE;

        int sum = 0;
        int count = 0;
        for (IntWritable iw : values) {
            int i = iw.get();
            if (i < this.min) this.min = i;
            if (i > max) this.max = i;
            sum += i;
            count++;
        }

        this.avg = count < 1 ? sum : (sum / (1.0*count));
    }
}

有了这个,减速器非常简单

public class CompositeReducer extends Reducer<Text, IntWritable, Text, MinMaxAvgWritable> {

    private final MinMaxAvgWritable output = new MinMaxAvgWritable();

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        // This 'set/merge' method could just as easily be defined here, and return a String to be set on a Text object
        output.set(values);  
        context.write(key, output);
    }
}

工作就这样设置

    // outputs for mapper and reducer
    job.setOutputKeyClass(Text.class);

    // setup mapper
    job.setMapperClass(TokenizerMapper.class);  // Replace with your mapper
    job.setMapOutputValueClass(IntWritable.class);

    // setup reducer
    job.setReducerClass(CompositeReducer.class);
    job.setOutputValueClass(MinMaxAvgWritable.class); // notice custom writable

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    return job.waitForCompletion(true) ? 0 : 1;