mapreduce ---自定义数据类型

时间:2013-03-23 08:03:43

标签: hadoop mapreduce

当我执行mapreduce程序时,我遇到键是一个元组(A,B)(A和B都是整数集)。如何自定义这种数据类型?

public static class MapClass extends Mapper<Object,Text,Tuple,Tuple>....

public class Tuple implements WritableComparable<Tuple>{ 


        @Override
        public void readFields(DataInput arg0) throws IOException {
            // TODO Auto-generated method stub

        }

        @Override
        public void write(DataOutput arg0) throws IOException {
            // TODO Auto-generated method stub

        }

        @Override
        public int compareTo(Tuple o) {
            // TODO Auto-generated method stub
            return 0;
        }
    }

2 个答案:

答案 0 :(得分:3)

你几乎就在那里,只需为A和B添加变量,然后完成序列化方法并比较:

public class Tuple implements WritableComparable<Tuple>{ 
    public Set<Integer> a = new TreeSet<Integer>;
    public Set<Integer> b = new TreeSet<Integer>;

    @Override
    public void readFields(DataInput arg0) throws IOException {
        a.clear();
        b.clear();

        int count = arg0.readInt();
        while (count-- > 0) {
          a.add(arg0.readInt());
        }

        count = arg0.readInt();
        while (count-- > 0) {
          b.add(arg0.readInt());
        }
    }

    @Override
    public void write(DataOutput arg0) throws IOException {
        arg0.writeInt(a.size());
        for (int v : a) {
          arg0.writeInt(v);
        }
        arg0.writeInt(b.size());
        for (int v : b) {
          arg0.writeInt(v);
        }
    }

    @Override
    public int compareTo(Tuple o) {
        // you'll need to implement how you want to compare the two sets between objects
    }
}

答案 1 :(得分:1)

要在hadoop中实现自定义数据类型,必须实现WritableComparable接口并为readFields()write()方法提供自定义实现。 除了readFiled和write方法的实现之外,必须覆盖java对象的equals和hashcode方法。

如果密钥的自定义数据类型实现必须实现类似的接口。