Murmur Hash - 哈希值不一致

时间:2013-11-06 04:41:15

标签: c++ hash

我在C ++中实现了MurmurHash并且它具有极好的散列,除非你用相同的字符串连续两次调用它...它不会散列到相同的数字。
为什么它不会一直散列到相同的数字?

int 
Hash::hf (string ins) {
  return MurmurHash2(&ins,ins.size(),11); 
}
unsigned int Hash::MurmurHash2 (const void *key, int len, unsigned int seed )
{
// 'm' and 'r' are mixing constants generated offline.
// They're not really 'magic', they just happen to work well.
const unsigned int m = 0x5bd1e995;
const int r = 24;

// Initialize the hash to a 'random' value

unsigned int h = seed ^ len;

// Mix 4 bytes at a time into the hash

const unsigned char * data = (const unsigned char *)key;

while(len >= 4)
{
    unsigned int k = *(unsigned int *)data;

    k *= m; 
    k ^= k >> r; 
    k *= m; 

    h *= m; 
    h ^= k;

    data += 4;
    len -= 4;
}

// Handle the last few bytes of the input array

switch(len)
{
case 3: h ^= data[2] << 16;
case 2: h ^= data[1] << 8;
case 1: h ^= data[0];
        h *= m;
};

// Do a few final mixes of the hash to ensure the last few
// bytes are well-incorporated.

h ^= h >> 13;
h *= m;
h ^= h >> 15;

return h % HASH_TABLE_SIZE;
 }

1 个答案:

答案 0 :(得分:0)

此代码不正确:

return MurmurHash2(&ins,ins.size(),11); 

正确的代码:

return MurmurHash2(ins.c_str(),ins.size(),11);

但是,我想说,这种杂音似乎是相对弱的函数,并且可以产生比随机更高概率的碰撞。 让你看看:如果在任何迭代中,“h”的最低位块变为零,那么在

之后这些位不会改变
h *= m;

此外,由于在下一个操作中使用了XOR,因此加倍的值将从哈希中异或。 因此,即使重复4字节子字符串,例如“abcdabcd”,也不会改变哈希的低位。