有谁能告诉我为什么在DJB哈希函数中使用数字5381?
DJB Hash功能
h(0)= 5381
h(i)= 33 * h(i-1)^ str [i]
一个c程序:
unsigned int DJBHash(char* str, unsigned int len)
{
unsigned int hash = 5381;
unsigned int i = 0;
for(i = 0; i < len; str++, i++)
{
hash = ((hash << 5) + hash) + (*str);
}
return hash;
}
答案 0 :(得分:55)
我偶然发现了comment,它揭示了DJB的目标:
/*
* DJBX33A (Daniel J. Bernstein, Times 33 with Addition)
*
* This is Daniel J. Bernstein's popular `times 33' hash function as
* posted by him years ago on comp.lang.c. It basically uses a function
* like ``hash(i) = hash(i-1) * 33 + str[i]''. This is one of the best
* known hash functions for strings. Because it is both computed very
* fast and distributes very well.
*
* The magic of number 33, i.e. why it works better than many other
* constants, prime or not, has never been adequately explained by
* anyone. So I try an explanation: if one experimentally tests all
* multipliers between 1 and 256 (as RSE did now) one detects that even
* numbers are not useable at all. The remaining 128 odd numbers
* (except for the number 1) work more or less all equally well. They
* all distribute in an acceptable way and this way fill a hash table
* with an average percent of approx. 86%.
*
* If one compares the Chi^2 values of the variants, the number 33 not
* even has the best value. But the number 33 and a few other equally
* good numbers like 17, 31, 63, 127 and 129 have nevertheless a great
* advantage to the remaining numbers in the large set of possible
* multipliers: their multiply operation can be replaced by a faster
* operation based on just one shift plus either a single addition
* or subtraction operation. And because a hash function has to both
* distribute good _and_ has to be very fast to compute, those few
* numbers should be preferred and seems to be the reason why Daniel J.
* Bernstein also preferred it.
*
*
* -- Ralf S. Engelschall <rse@engelschall.com>
*/
这是一个与你正在查看的哈希函数略有不同的哈希函数,尽管它确实使用了5831幻数。链接目标下面的注释代码已经展开。
然后我找到this:
Magic Constant 5381: 1. odd number 2. prime number 3. deficient number 4. 001/010/100/000/101 b
this还有Can anybody explain the logic behind djb2 hash function?个答案。它将DJB自己的post引用到邮件列表中,提到了5381(摘自此处摘录的答案):
[...]几乎任何好的乘数都有效。我觉得你很担心 关于31c + d没有覆盖任何合理范围的哈希的事实 如果c和d在0到255之间,那就是值。这就是我发现的原因 我启动了33个哈希函数并开始在我的压缩器中使用它 哈希值为5381.我想你会发现这就是这样 以及261乘数。
答案 1 :(得分:27)
5381只是一个在测试中产生fewer collisions和better avalanching的数字。你会在几乎每一个哈希算法中找到“魔术常数”。
答案 2 :(得分:20)
我发现这个号码的一个非常有趣的属性可能是这个原因。
5381是第709个素数 709是第127个素数 127是第31个素数 31是第11个素数 11是第五素数 5是第3个素数 3是第二素数 2是第一素数。5381是第一个发生8次的数字。 5381st prime可能会超过signed int的限制,因此停止链条是个好点。