当我执行mapreduce程序时,我遇到键是一个元组(A,B)(A和B都是整数集)。如何自定义这种数据类型?
public static class MapClass extends Mapper<Object,Text,Tuple,Tuple>....
public class Tuple implements WritableComparable<Tuple>{
@Override
public void readFields(DataInput arg0) throws IOException {
// TODO Auto-generated method stub
}
@Override
public void write(DataOutput arg0) throws IOException {
// TODO Auto-generated method stub
}
@Override
public int compareTo(Tuple o) {
// TODO Auto-generated method stub
return 0;
}
}
答案 0 :(得分:3)
你几乎就在那里,只需为A和B添加变量,然后完成序列化方法并比较:
public class Tuple implements WritableComparable<Tuple>{
public Set<Integer> a = new TreeSet<Integer>;
public Set<Integer> b = new TreeSet<Integer>;
@Override
public void readFields(DataInput arg0) throws IOException {
a.clear();
b.clear();
int count = arg0.readInt();
while (count-- > 0) {
a.add(arg0.readInt());
}
count = arg0.readInt();
while (count-- > 0) {
b.add(arg0.readInt());
}
}
@Override
public void write(DataOutput arg0) throws IOException {
arg0.writeInt(a.size());
for (int v : a) {
arg0.writeInt(v);
}
arg0.writeInt(b.size());
for (int v : b) {
arg0.writeInt(v);
}
}
@Override
public int compareTo(Tuple o) {
// you'll need to implement how you want to compare the two sets between objects
}
}
答案 1 :(得分:1)
要在hadoop中实现自定义数据类型,必须实现WritableComparable接口并为readFields()write()方法提供自定义实现。 除了readFiled和write方法的实现之外,必须覆盖java对象的equals和hashcode方法。
如果密钥的自定义数据类型实现必须实现类似的接口。