如何在Hadoop中将对象作为值传递

时间:2012-12-19 21:48:47

标签: hadoop mapreduce

是否允许将对象(如树)作为Hadoop中mapper的输出值传递?是这样,怎么样?

1 个答案:

答案 0 :(得分:3)

扩展Tariq的链接,并简单详细说明<Text, IntWritable>树形图的一种可能实现:

public class TreeMapWritable extends TreeMap<Text, IntWritable> 
                             implements Writable {

    @Override
    public void write(DataOutput out) throws IOException {
        // write out the number of entries
        out.writeInt(size());
        // output each entry pair
        for (Map.Entry<Text, IntWritable> entry : entrySet()) {
            entry.getKey().write(out);
            entry.getValue().write(out);
        }
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        // clear current contents - hadoop re-uses objects
        // between calls to your map / reduce methods
        clear();

        // read how many items to expect
        int count = in.readInt();
        // deserialize a key and value pair, insert into map
        while (count-- > 0) {
            Text key = new Text();
            key.readFields(in);

            IntWritable value = new IntWritable();
            value.readFields(in);

            put(key, value);
        }
    }
}

基本上,Hadoop中的默认序列化工厂需要对象输出来实现Writable接口(上面详述的readFields和write方法)。通过这种方式,您几乎可以扩展任何类来改进序列化方法。

另一种选择是通过配置org.apache.hadoop.io.serializer.JavaSerialization配置属性来启用Java Serialization(使用默认的java序列化方法)io.serializations,但我不建议这样做。