我正在编写自定义key
课程,但没有hashCode
实施。
我运行map-reduce
作业,但在作业配置期间,我设置了partitoner
类:
比如
Job job = Job.getInstance(config);
job.setJarByClass(ReduceSideJoinDriver.class);
FileInputFormat.addInputPaths(job, filePaths.toString());
FileOutputFormat.setOutputPath(job, new Path(args[args.length-1]));
job.setMapperClass(JoiningMapper.class);
job.setReducerClass(JoiningReducer.class);
job.setPartitionerClass(TaggedJoiningPartitioner.class); -- Here is the partitioner set
job.setGroupingComparatorClass(TaggedJoiningGroupingComparator.class);
job.setOutputKeyClass(TaggedKey.class);
job.setOutputValueClass(Text.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
以下是partitioner
实施:
public class TaggedJoiningPartitioner extends Partitioner<TaggedKey,Text> {
@Override
public int getPartition(TaggedKey taggedKey, Text text, int numPartitions) {
return Math.abs(taggedKey.getJoinKey().hashCode()) % numPartitions;
}
}
我运行map-reduce
作业并保存输出。
现在我在上面的工作设置中评论job.setPartitionerClass(TaggedJoiningPartitioner.class);
。
我在自定义类中实现了hashCode()
,如下所示:
public class TaggedKey implements Writable, WritableComparable<TaggedKey> {
private Text joinKey = new Text();
private IntWritable tag = new IntWritable();
@Override
public int compareTo(TaggedKey taggedKey) {
int compareValue = this.joinKey.compareTo(taggedKey.getJoinKey());
if(compareValue == 0 ){
compareValue = this.tag.compareTo(taggedKey.getTag());
}
return compareValue;
}
@Override
public void write(DataOutput out) throws IOException {
joinKey.write(out);
tag.write(out);
}
@Override
public void readFields(DataInput in) throws IOException {
joinKey.readFields(in);
tag.readFields(in);
}
@Override
public int hashCode(){
return joinKey.hashCode();
}
@Override
public boolean equals(Object o){
if (this==o)
return true;
if (!(o instanceof TaggedKey)){
return false;
}
TaggedKey that=(TaggedKey)o;
return this.joinKey.equals(that.joinKey);
}
}
现在我再次运行该作业(注意:我没有设置任何partitoner
)。在map-reduce作业之后,我比较了前一个的输出。它们都完全相同。
所以我的问题是:
1) Is this behavior universal, that is always reproducible in any
custom implementations?
2) Does implementing hashcode on my key class is same as doing a
job.setPartitionerClass.
3) If they both serve same purpose, what is the need for
setPartitonerClass?
4) if both hashcode() implementation and Partitonerclass
implementation are conflicting, which one will take precedence?
答案 0 :(得分:0)
您获得了相同的结果,因为您的自定义分区程序正在执行默认分区程序。您只是将代码移动到另一个类并在那里执行它。把它放在不同的逻辑中 key()。toString()。length()%numPartitions 或者除了获取hashcode()%numPartitions之外的其他东西,你会看到不同的密钥分配给reducers。
例如,您无法通过编辑hashcode()
来获取此分区符号public static class MyPartitioner扩展了Partitioner {
@Override
public int getPartition(Text key, Text value, int numReduceTasks) {
int len = key.value().length;
if(numReduceTasks == 0)
return 0;
if(len <=numReduceTasks/3){
return 0;
}
if(len >numReduceTasks/3 && len <=numReduceTasks/2){
return 1 % numReduceTasks;
}
else
return len % numReduceTasks;
}
}