MapReduce:在写入上下文时无限期地减少停顿

时间:2015-11-01 11:03:21

标签: java hadoop mapreduce

以下是map reduce程序,其中在map函数中完成过滤,并在reduce步骤中完成求和。

地图部分执行正常。但是当reduce部分运行时,它会停留在 context.write(key,value)行。

只有当我尝试在reduce函数类型中编写不同于map函数中写入的输出时才会发生这种情况

public class Filter3 {

public static class TokenizerMapper extends Mapper<Object, Text, Text, Contestant>{

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

            String[] cols = value.toString().split(",");

            try {
                Contestant val = new Contestant(cols[0],cols[1],cols[2]);

                System.out.println();
                System.out.println();
                System.out.print(key+" ::: ");
                System.out.println(val);
                System.out.println();
                System.out.println();

                val.name = val.name.toUpperCase();

                if(val.rating>=9) {
                    context.write(new Text(val.name), val); //write null if it is not required
                }
            } catch(Exception ex) {
                ex.printStackTrace();
            }

        }
    }

    public static class AvgRatingReducer extends Reducer<Text,Contestant,Text,DoubleWritable> {

        private DoubleWritable result = new DoubleWritable(0.0);

        public void reduce(Text key, Iterable<Contestant> values, Context context ) throws IOException, InterruptedException {        

            double sum = 0.0;
            int count = 0;

            for (Contestant val : values) {
                sum += val.rating;
                count++;
            }

            if(count>0) {
                result.set(sum/(double)count);
            }

            System.out.println(result);

            context.write(key, result);

        }
    }

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "AvgMRJob"); //configuration and job name

        job.setJarByClass(Filter3.class);

        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(AvgRatingReducer.class);
        job.setReducerClass(AvgRatingReducer.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(DoubleWritable.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(DoubleWritable.class);

        Path inPath = new Path(args[0]);
        Path outPath = new Path(args[1]);
        outPath.getFileSystem(conf).delete(outPath,true);

        FileInputFormat.addInputPath(job, inPath);
        FileOutputFormat.setOutputPath(job, outPath);

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

使用的可写对象是:

public class Contestant implements Writable {

    long id;
    String name;
    double rating;

    public Contestant() {}

    public Contestant(long id, String name, double rating) {
        this.id = id;
        this.name = name;
        this.rating = rating;
    }

    public Contestant(String id, String name, String rating) {
        try {
            this.id = Long.parseLong(id.trim());
        } catch(Exception ex) {

        }
        this.name = name;
        try {
            this.rating = Double.parseDouble(rating.trim());
        } catch(Exception ex) {

        }

    }

    @Override
    public void readFields(DataInput inp) throws IOException {

        id = inp.readLong();
        name = WritableUtils.readString(inp);
        rating = inp.readDouble();
    }

    @Override
    public void write(DataOutput out) throws IOException {

        out.writeLong(id);
        WritableUtils.writeString(out, name);
        out.writeDouble(rating);
    }

    @Override
    public String toString() {

        return this.id + "," + this.name + "," + this.rating;
    }
}

将输出写入上下文时,执行会陷入reduce函数。我没有错误/异常。它只是无限期地挂起。 我无法确定问题是什么。我遵循了MapReduce的常规程序。

enter image description here

注意: 如果我在map和reduce中写入相同类型的数据,则相同的程序可以工作。即如果我在Map和Reduce函数中写(key = Text,val = Contestant)。 - 而不是在reduce !!中使用DoubleWritable

2 个答案:

答案 0 :(得分:1)

删除合并器:

// job.setCombinerClass(AvgRatingReducer.class);

如果使用组合器,则需要确保reducer适用于组合器类的输出,而不是映射器。

答案 1 :(得分:0)

mapreduce组合输入<key,value>对和输出<key,value>对必须相同。这是组合器的规则,而对于reducer,此规则不存在

在这种情况下,reducer正在读取与映射器输出相同的<key,value><Text,Contestant>,并将<Text,DoubleWritable>写为输出<key,val>对。 / p>

因此,如果没有合成器,这将有效。在添加组合器时,我们必须确保输入<key,val>对和输出<key,val>对对于组合器步骤是相同的​​。

<key1, value1, key1,value1>

这里,错误使用相同的reducer类作为组合器,因为不满足上述规则。 reducers输入<key,val>对与输出<key,val>对不同。