如果我已经在Map-reduce作业中为键实现哈希码,那么customPartitioner是否有用?

时间:2014-07-16 19:13:11

标签: hadoop mapreduce hashcode hadoop-partitioning

我正在编写自定义key课程,但没有hashCode实施。

我运行map-reduce作业,但在作业配置期间,我设置了partitoner类: 比如

        Job job = Job.getInstance(config);
        job.setJarByClass(ReduceSideJoinDriver.class);

        FileInputFormat.addInputPaths(job, filePaths.toString());
        FileOutputFormat.setOutputPath(job, new Path(args[args.length-1]));

        job.setMapperClass(JoiningMapper.class);
        job.setReducerClass(JoiningReducer.class);
        job.setPartitionerClass(TaggedJoiningPartitioner.class); -- Here is the partitioner set
        job.setGroupingComparatorClass(TaggedJoiningGroupingComparator.class);
        job.setOutputKeyClass(TaggedKey.class);
        job.setOutputValueClass(Text.class);
        System.exit(job.waitForCompletion(true) ? 0 : 1);

以下是partitioner实施:

public class TaggedJoiningPartitioner extends Partitioner<TaggedKey,Text> {

    @Override
    public int getPartition(TaggedKey taggedKey, Text text, int numPartitions) {
        return Math.abs(taggedKey.getJoinKey().hashCode()) % numPartitions;
    }
}

我运行map-reduce作业并保存输出。

现在我在上面的工作设置中评论job.setPartitionerClass(TaggedJoiningPartitioner.class);

我在自定义类中实现了hashCode(),如下所示:

public class TaggedKey implements Writable, WritableComparable<TaggedKey> {

    private Text joinKey = new Text();
    private IntWritable tag = new IntWritable();

    @Override
    public int compareTo(TaggedKey taggedKey) {
        int compareValue = this.joinKey.compareTo(taggedKey.getJoinKey());
        if(compareValue == 0 ){
            compareValue = this.tag.compareTo(taggedKey.getTag());
        }
       return compareValue;
    }

    @Override
    public void write(DataOutput out) throws IOException {
        joinKey.write(out);
        tag.write(out);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        joinKey.readFields(in);
        tag.readFields(in);
    }

    @Override
    public int hashCode(){
        return joinKey.hashCode();
    }

    @Override
    public boolean equals(Object o){
        if (this==o)
            return true;
        if (!(o instanceof TaggedKey)){
            return false;
        }
        TaggedKey that=(TaggedKey)o;
        return this.joinKey.equals(that.joinKey);
    }
}

现在我再次运行该作业(注意:我没有设置任何partitoner)。在map-reduce作业之后,我比较了前一个的输出。它们都完全相同。

所以我的问题是:

   1)  Is this behavior universal, that is always reproducible in any
        custom implementations? 

    2) Does implementing hashcode on my key class is same as doing a
    job.setPartitionerClass.

    3) If they both serve same purpose, what is the need for
    setPartitonerClass?

    4) if both hashcode() implementation and Partitonerclass
    implementation are conflicting, which one will take precedence?

1 个答案:

答案 0 :(得分:0)

您获得了相同的结果,因为您的自定义分区程序正在执行默认分区程序。您只是将代码移动到另一个类并在那里执行它。把它放在不同的逻辑中 key()。toString()。length()%numPartitions 或者除了获取hashcode()%numPartitions之外的其他东西,你会看到不同的密钥分配给reducers。

例如,您无法通过编辑hashcode()

来获取此分区符号

public static class MyPartitioner扩展了Partitioner {

    @Override
    public int getPartition(Text key, Text value, int numReduceTasks) {

        int len = key.value().length;

        if(numReduceTasks == 0)
            return 0;

        if(len <=numReduceTasks/3){               
            return 0;
        }
        if(len >numReduceTasks/3 && len <=numReduceTasks/2){

            return 1 % numReduceTasks;
        }
        else
            return len % numReduceTasks;
    }
}