Question

如果我们谈论排序键，compare()和compareTo()会同步工作但我只想知道在高度配置的机器时代，是否需要考虑何时使用compare()和何时使用compareTo()？

如果有任何需要考虑compare(byte b1[],int s1,int l1, byte b2[],int s2,int l2)优于compareTo(object key1,Object key2)的情况，请建议我们确实需要决定哪个字段或用例或问题类型使用？

谢谢！

Answer 1

使用RawComparator：

如果您仍想优化Map Reduce Job所花费的时间，那么您必须使用RawComparator。

中间键值对已从Mapper传递到Reducer。在这些值从Mapper到达Reducer之前，将执行随机播放和排序步骤。

排序得到改进，因为RawComparator会逐字节比较密钥。如果我们不使用RawComparator，则必须完全反序列化中间密钥才能执行比较。

示例：

public class IndexPairComparator extends WritableComparator { protected IndexPairComparator() { super(IndexPair.class); } @Override public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { int i1 = readInt(b1, s1); int i2 = readInt(b2, s2); int comp = (i1 < i2) ? -1 : (i1 == i2) ? 0 : 1; if(0 != comp) return comp; int j1 = readInt(b1, s1+4); int j2 = readInt(b2, s2+4); comp = (j1 < j2) ? -1 : (j1 == j2) ? 0 : 1; return comp; }

}

在上面的例子中，我们没有直接实现RawComparator。相反，我们扩展了WritableComparator，它在内部实现了RawComparator。

通过Jee Vang
查看此article
在 WritableComparator 中实施 RawComparator（）：只需比较密钥

public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { try { buffer.reset(b1, s1, l1); // parse key1 key1.readFields(buffer); buffer.reset(b2, s2, l2); // parse key2 key2.readFields(buffer); } catch (IOException e) { throw new RuntimeException(e); } return compare(key1, key2); // compare them }

查看source

Raw Comparator与WritableComparable

1 个答案: