Question

以下是map reduce程序，其中在map函数中完成过滤，并在reduce步骤中完成求和。

地图部分执行正常。但是当reduce部分运行时，它会停留在 context.write（key，value）行。

只有当我尝试在reduce函数类型中编写不同于map函数中写入的输出时才会发生这种情况

public class Filter3 {

public static class TokenizerMapper extends Mapper<Object, Text, Text, Contestant>{

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

            String[] cols = value.toString().split(",");

            try {
                Contestant val = new Contestant(cols[0],cols[1],cols[2]);

                System.out.println();
                System.out.println();
                System.out.print(key+" ::: ");
                System.out.println(val);
                System.out.println();
                System.out.println();

                val.name = val.name.toUpperCase();

                if(val.rating>=9) {
                    context.write(new Text(val.name), val); //write null if it is not required
                }
            } catch(Exception ex) {
                ex.printStackTrace();
            }

        }
    }

    public static class AvgRatingReducer extends Reducer<Text,Contestant,Text,DoubleWritable> {

        private DoubleWritable result = new DoubleWritable(0.0);

        public void reduce(Text key, Iterable<Contestant> values, Context context ) throws IOException, InterruptedException {        

            double sum = 0.0;
            int count = 0;

            for (Contestant val : values) {
                sum += val.rating;
                count++;
            }

            if(count>0) {
                result.set(sum/(double)count);
            }

            System.out.println(result);

            context.write(key, result);

        }
    }

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "AvgMRJob"); //configuration and job name

        job.setJarByClass(Filter3.class);

        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(AvgRatingReducer.class);
        job.setReducerClass(AvgRatingReducer.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(DoubleWritable.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(DoubleWritable.class);

        Path inPath = new Path(args[0]);
        Path outPath = new Path(args[1]);
        outPath.getFileSystem(conf).delete(outPath,true);

        FileInputFormat.addInputPath(job, inPath);
        FileOutputFormat.setOutputPath(job, outPath);

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

使用的可写对象是：

public class Contestant implements Writable {

    long id;
    String name;
    double rating;

    public Contestant() {}

    public Contestant(long id, String name, double rating) {
        this.id = id;
        this.name = name;
        this.rating = rating;
    }

    public Contestant(String id, String name, String rating) {
        try {
            this.id = Long.parseLong(id.trim());
        } catch(Exception ex) {

        }
        this.name = name;
        try {
            this.rating = Double.parseDouble(rating.trim());
        } catch(Exception ex) {

        }

    }

    @Override
    public void readFields(DataInput inp) throws IOException {

        id = inp.readLong();
        name = WritableUtils.readString(inp);
        rating = inp.readDouble();
    }

    @Override
    public void write(DataOutput out) throws IOException {

        out.writeLong(id);
        WritableUtils.writeString(out, name);
        out.writeDouble(rating);
    }

    @Override
    public String toString() {

        return this.id + "," + this.name + "," + this.rating;
    }
}

将输出写入上下文时，执行会陷入reduce函数。我没有错误/异常。它只是无限期地挂起。我无法确定问题是什么。我遵循了MapReduce的常规程序。

注意：如果我在map和reduce中写入相同类型的数据，则相同的程序可以工作。即如果我在Map和Reduce函数中写（key = Text，val = Contestant）。 - 而不是在reduce !!中使用DoubleWritable

Answer 1

删除合并器：

// job.setCombinerClass(AvgRatingReducer.class);

如果使用组合器，则需要确保reducer适用于组合器类的输出，而不是映射器。

Answer 2

mapreduce组合输入<key,value>对和输出<key,value>对必须相同。这是组合器的规则，而对于reducer，此规则不存在

在这种情况下，reducer正在读取与映射器输出相同的<key,value>对<Text,Contestant>，并将<Text,DoubleWritable>写为输出<key,val>对。 / p>

因此，如果没有合成器，这将有效。在添加组合器时，我们必须确保输入<key,val>对和输出<key,val>对对于组合器步骤是相同的。

即<key1, value1, key1,value1>

这里，错误使用相同的reducer类作为组合器，因为不满足上述规则。 reducers输入<key,val>对与输出<key,val>对不同。

MapReduce：在写入上下文时无限期地减少停顿

2 个答案: