如何为自定义Hadoop类型定义ArrayWritable?我试图在Hadoop中实现倒排索引,使用自定义Hadoop类型来存储数据
我有一个 Individual Posting 类,它存储术语频率,文档ID和文档中术语的字节偏移列表。
我有一个发布类,其中包含文档频率(术语出现的文档数)和个人发布列表
我已经为 IndividualPostings 中的字节偏移列表定义了一个LongArrayWritable扩展ArrayWritable类
当我为 IndividualPosting 定义自定义ArrayWritable时,我在本地部署后遇到了一些问题(使用Karmasphere,Eclipse)。
Posting类列表中的所有 IndividualPosting 实例都是相同的,即使我在Reduce方法中得到不同的值
答案 0 :(得分:9)
来自ArrayWritable
的文档:
可写入包含类实例的数组。这个可写的元素必须都是同一个类的实例。如果此可写对象是Reducer的输入,则需要创建一个子类,将值设置为正确的类型。例如:
public class IntArrayWritable extends ArrayWritable { public IntArrayWritable() { super(IntWritable.class); } }
您已经引用了Hadoop定义的WritableComparable
类型。以下是我假设LongWritable
的实现方式:
public static class LongArrayWritable extends ArrayWritable
{
public LongArrayWritable() {
super(LongWritable.class);
}
public LongArrayWritable(LongWritable[] values) {
super(LongWritable.class, values);
}
}
您应该可以使用WritableComparable
给出的任何实现the documentation的类型执行此操作。使用他们的例子:
public class MyWritableComparable implements
WritableComparable<MyWritableComparable> {
// Some data
private int counter;
private long timestamp;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}
public int compareTo(MyWritableComparable other) {
int thisValue = this.counter;
int thatValue = other.counter;
return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
}
}
应该那样。这假设您使用的是Hadoop API的修订版0.20.2
或0.21.0
。