为自定义Hadoop类型实现ArrayWritable

时间:2010-12-08 11:10:26

标签: hadoop mapreduce

如何为自定义Hadoop类型定义ArrayWritable?我试图在Hadoop中实现倒排索引,使用自定义Hadoop类型来存储数据

我有一个 Individual Posting 类,它存储术语频率,文档ID和文档中术语的字节偏移列表。

我有一个发布类,其中包含文档频率(术语出现的文档数)和个人发布列表

我已经为 IndividualPostings 中的字节偏移列表定义了一个LongArrayWritable扩展ArrayWritable类

当我为 IndividualPosting 定义自定义ArrayWritable时,我在本地部署后遇到了一些问题(使用Karmasphere,Eclipse)。

Posting类列表中的所有 IndividualPosting 实例都是相同的,即使我在Reduce方法中得到不同的值

1 个答案:

答案 0 :(得分:9)

来自ArrayWritable的文档:

  

可写入包含类实例的数组。这个可写的元素必须都是同一个类的实例。如果此可写对象是Reducer的输入,则需要创建一个子类,将值设置为正确的类型。例如:public class IntArrayWritable extends ArrayWritable { public IntArrayWritable() { super(IntWritable.class); } }

您已经引用了Hadoop定义的WritableComparable类型。以下是我假设LongWritable的实现方式:

public static class LongArrayWritable extends ArrayWritable
{
    public LongArrayWritable() {
        super(LongWritable.class);
    }
    public LongArrayWritable(LongWritable[] values) {
        super(LongWritable.class, values);
    }
}

您应该可以使用WritableComparable给出的任何实现the documentation的类型执行此操作。使用他们的例子:

public class MyWritableComparable implements
        WritableComparable<MyWritableComparable> {

    // Some data
    private int counter;
    private long timestamp;

    public void write(DataOutput out) throws IOException {
        out.writeInt(counter);
        out.writeLong(timestamp);
    }

    public void readFields(DataInput in) throws IOException {
        counter = in.readInt();
        timestamp = in.readLong();
    }

    public int compareTo(MyWritableComparable other) {
        int thisValue = this.counter;
        int thatValue = other.counter;
        return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
    }
}

应该那样。这假设您使用的是Hadoop API的修订版0.20.20.21.0