Hadoop:Map输出键没有实现WritableComparable。实现RawComparator

时间:2014-03-07 22:56:31

标签: java hadoop

如果能帮助我,我将不胜感激。 我正在写一个关于Hadoop的程序。 Map输出键是类org.apache.mahout.clustering.kmeans.Kluster,它不实现WritableComparable。因此,我添加了

job.getConfiguration().setClass("mapred.output.key.comparator.class", KlusterComparable.class, RawComparator.class);

到我的代码。并定义KlusterComparable.class如下:

    public static class KlusterComparable implements RawComparator<Kluster>{

    @Override
    public int compare(Kluster k1, Kluster k2) {
        Vector v1 = k1.getCenter();
        Vector v2 = k2.getCenter();
        int res = 0;
        int vsize;
        if(v1.size() < v2.size())
            vsize = v2.size();
        else 
            vsize = v1.size();

        for(int i=0; i<vsize; i++){
            if(v1.get(i) < v2.get(i)){
                res = -1;
                break;
            }else if(v1.get(i) > v2.get(i)){
                res = 1;
                break;
            }
        }
        return res;
    }

    @Override
    public int compare(byte[] k1, int s1, int l1, byte[] k2,
            int s2, int l2) {
        Kluster kl1 = null;
        Kluster kl2 = null;

        byte[] b1 = Arrays.copyOfRange(k1, s1, s1+l1-1);
        byte[] b2 = Arrays.copyOfRange(k1, s2, s2+l2-1);
        try{
        kl1 = (Kluster)(SerializationUtils.deserialize(b1));
        kl2 = (Kluster)(SerializationUtils.deserialize(b2));

        }catch(Exception ex){
            System.out.println("Exception!!!");
        }
        return compare(kl1, kl2);
    }
}

但是当我在Hadoop上运行jar时遇到错误:FAILED java.io.IOException: Spill failed

当我发现异常时,我有代码打印Exception!!!

1 个答案:

答案 0 :(得分:0)

Arrays.copyOfRange的第三个参数是独占的。例如,原始数组为[0, 1, 2, 3, 4]Arrays.copyOfRange(a, 1, 3)将获得[1, 2]

您的代码应为:

byte[] b1 = Arrays.copyOfRange(k1, s1, s1+l1);
byte[] b2 = Arrays.copyOfRange(k1, s2, s2+l2);

实际上,您可以从WritableComparator了解如何在Hadoop中进行比较。这是一个从中借鉴一些想法的考试。

public class KlusterComparator implements RawComparator<Kluster> {

private final Kluster key1;
private final Kluster key2;
private final DataInputBuffer buffer;

public KlusterComparator() {
    key1 = new Kluster();
    key2 = new Kluster();
    buffer = new DataInputBuffer();
}

@Override
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
    try {
        buffer.reset(b1, s1, l1); // parse key1
        key1.readFields(buffer);

        buffer.reset(b2, s2, l2); // parse key2
        key2.readFields(buffer);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }

    return compare(key1, key2); // compare them
}

@Override
public int compare(Kluster o1, Kluster o2) {
    // compare o1 and o2
    return 0;
}

}