在将输出从mapper传递到reducer之前,mapreduce中的排序是如何发生的。如果我的mapper输出键的类型是IntWritable,它是否使用IntWritable类中定义的比较器或类中的compareTo方法,如果是,则调用是如何进行的。如果不是如何执行排序,如何进行调用?
答案 0 :(得分:1)
首先收集地图作业输出,然后将其发送到分区程序,负责确定将向哪个Reducer发送数据(它尚未通过reduce()
呼叫进行分组)。默认的分区程序使用Key的hashCode()
方法和使用Reducers数量的模数来做。
之后,将调用Comparator对Map输出执行排序。 Flow看起来像这样:
收藏家 - >分区程序 - >溢出 - >比较器 - >本地磁盘(HDFS)< - MapOutputServlet
然后,每个Reducer将从分区程序分配给它的映射器中复制数据,并将其传递给Grouper,Grouper将确定如何为单个Reducer函数调用分组记录:
MapOutputServlet - >复制到本地磁盘(HDFS) - >组 - >减少
在函数调用之前,记录还将经过排序阶段以确定它们到达reducer的顺序。排序器(WritableComparator()
)将调用密钥的compareTo()
(WritableComparable()
接口)方法。
为了让您更好地了解,以下是如何为自定义组合键实现基本compareTo()
,分组器和分类器:
public class CompositeKey implements WritableComparable<CompositeKey> {
IntWritable primaryField = new IntWritable();
IntWritable secondaryField = new IntWritable();
public CompositeKey(IntWritable p, IntWritable s) {
this.primaryField.set(p);
this.secondaryField = s;
}
public void write(DataOutput out) throws IOException {
this.primaryField.write(out);
this.secondaryField.write(out);
}
public void readFields(DataInput in) throws IOException {
this.primaryField.readFields(in);
this.secondaryField.readFields(in);
}
// Called by the partitionner to group map outputs to same reducer instance
// If the hash source is simple (primary type or so), a simple call to their hashCode() method is good enough
public int hashCode() {
return this.primaryField.hashCode();
}
@Override
public int compareTo(CompositeKey other) {
if (this.getPrimaryField().equals(other.getPrimaryField())) {
return this.getSecondaryField().compareTo(other.getSecondaryField());
} else {
return this.getPrimaryField().compareTo(other.getPrimaryField());
}
}
}
public class CompositeGroupingComparator extends WritableComparator {
public CompositeGroupingComparator() {
super(CompositeKey.class, true);
}
@Override
public int compare(WritableComparable a, WritableComparable b) {
CompositeKey first = (CompositeKey) a;
CompositeKey second = (CompositeKey) b;
return first.getPrimaryField().compareTo(second.getPrimaryField());
}
}
public class CompositeSortingComparator extends WritableComparator {
public CompositeSortingComparator() {
super (CompositeKey.class, true);
}
@Override
public int compare (WritableComparable a, WritableComparable b){
CompositeKey first = (CompositeKey) a;
CompositeKey second = (CompositeKey) b;
return first.compareTo(second);
}
}
答案 1 :(得分:0)
Mapper框架负责比较我们所有的默认数据类型,如IntWritable,DoubleWritable e.t.c ......但是如果你有一个用户定义的keytype,你需要实现WritableComparable接口。
WritableComparables可以相互比较,通常通过Comparators进行比较。在Hadoop Map-Reduce框架中用作密钥的任何类型都应实现此接口。
请注意,hashCode()经常在Hadoop中用于分区键。重要的是,您的hashCode()实现会在JVM的不同实例中返回相同的结果。另请注意,Object中的默认hashCode()实现不满足此属性。
示例:
public class MyWritableComparable implements WritableComparable {
// Some data
private int counter;
private long timestamp;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}
public int compareTo(MyWritableComparable o) {
int thisValue = this.value;
int thatValue = o.value;
return (thisValue < thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
}
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + counter;
result = prime * result + (int) (timestamp ^ (timestamp >>> 32));
return result
}
}
来自:https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/WritableComparable.html