为什么模数等于&对于2的基数值?

时间:2013-04-15 23:10:22

标签: java algorithm hashmap bit-manipulation

它常用于java.util.HashMap

/**
 * Returns index for hash code h.
 */
static int indexFor(int h, int length) {
    return h & (length-1);
}

其中长度为2。

或Lucene bloom过滤代码(org.apache.lucene.codecs.bloom.FuzzySet)

// Bloom sizes are always base 2 and so can be ANDed for a fast modulo
int pos = positiveHash & bloomSize;

对我来说没有意义,因为例如8 i % 8i & 8之间的差异不是零!

    scala> (0 to 20).map(i => (i & 8) - (i % 8))
res33: scala.collection.immutable.IndexedSeq[Int] = Vector(0, -1, -2, -3, -4, -5, -6, -7, 8, 7, 6, 5, 4, 3, 2, 1, 0, -1, -2, -3, -4)

1 个答案:

答案 0 :(得分:1)

HashMap和FuzzySet都没有精确地使用2的幂 - 它们使用2^n - 1形式的整数。你在FuzzySet中引用的评论不幸在这方面有误导性,但如果你做一点挖掘,你可以找到这个代码块:

//The sizes of BitSet used are all numbers that, when expressed in binary form,
//are all ones. This is to enable fast downsizing from one bitset to another
//by simply ANDing each set index in one bitset with the size of the target bitset
// - this provides a fast modulo of the number. Values previously accumulated in
// a large bitset and then mapped to a smaller set can be looked up using a single
// AND operation of the query term's hash rather than needing to perform a 2-step
// translation of the query term that mirrors the stored content's reprojections.
static final int usableBitSetSizes[];
static
{
  usableBitSetSizes=new int[30];
  int mask=1;
  int size=mask;
  for (int i = 0; i < usableBitSetSizes.length; i++) {
      size=(size<<1)|mask;
      usableBitSetSizes[i]=size;
  }    
}

FuzzySet中的bitsize变量始终最终从此数组中分配。这里的评论也描述了究竟发生了什么。

要计算X % 8 (1000),要计算X & 7 (0111)。这适用于2的所有权力。