是否允许将对象(如树)作为Hadoop中mapper的输出值传递?是这样,怎么样?
答案 0 :(得分:3)
扩展Tariq的链接,并简单详细说明<Text, IntWritable>
树形图的一种可能实现:
public class TreeMapWritable extends TreeMap<Text, IntWritable>
implements Writable {
@Override
public void write(DataOutput out) throws IOException {
// write out the number of entries
out.writeInt(size());
// output each entry pair
for (Map.Entry<Text, IntWritable> entry : entrySet()) {
entry.getKey().write(out);
entry.getValue().write(out);
}
}
@Override
public void readFields(DataInput in) throws IOException {
// clear current contents - hadoop re-uses objects
// between calls to your map / reduce methods
clear();
// read how many items to expect
int count = in.readInt();
// deserialize a key and value pair, insert into map
while (count-- > 0) {
Text key = new Text();
key.readFields(in);
IntWritable value = new IntWritable();
value.readFields(in);
put(key, value);
}
}
}
基本上,Hadoop中的默认序列化工厂需要对象输出来实现Writable接口(上面详述的readFields和write方法)。通过这种方式,您几乎可以扩展任何类来改进序列化方法。
另一种选择是通过配置org.apache.hadoop.io.serializer.JavaSerialization
配置属性来启用Java Serialization(使用默认的java序列化方法)io.serializations
,但我不建议这样做。