我正在尝试在Python中实现CARP哈希,如以下IETF草案中所述:
http://tools.ietf.org/html/draft-vinod-carp-v1-03#section-3.1
具体做法是:
3.1。散列函数
散列函数基于a输出32位无符号整数 零终止的ASCII输入字符串。机器名称和域名 URL的名称,协议和每个成员的机器名称 代理应该以小写形式进行评估,因为该部分 URL不区分大小写。
因为不可逆性和强大的加密功能 这个应用程序不必要,一个非常简单和快速的哈希 使用基于按位左旋转运算符的函数。
For(URL中的每个字符): URL_Hash + = _rotl(URL_Hash,19)+ char;
成员代理哈希以类似的方式计算:
For(MemberProxyName中的每个char): MemberProxy_Hash + = _rotl(MemberProxy_Hash,19)+ char;
Becaues成员名称通常彼此相似,他们的哈希值 值通过以下方式进一步分布在哈希空间中 附加操作:
MemberProxy_Hash + = MemberProxy_Hash * 0x62531965; MemberProxy_Hash = _rotl(MemberProxy_Hash,21);
3.2。哈希组合
通过首先对URL散列进行排他或异或(XOR)来组合散列 机器名称然后乘以常量并执行 按位旋转。
所有最终值和中间值都是32位无符号整数。
Combined_Hash =(URL_hash ^ MemberProxy_Hash); Combined_Hash + = Combined_Hash * 0x62531965; Combined_Hash = _rotl(Combined_Hash,21);
我尝试使用numpy来创建32位无符号整数。当实现左位移时,第一个问题就出现了。 Numpy自动将结果重新整理为64位无符号整数。对于任何会溢出32位的算法都是一样的。
例如:
from numpy import uint32
def key_hash(data):
# hash should be a 32-bit unsigned integer
hashed = uint32()
for char in data:
hashed += hashed << 19 + ord(char)
return hashed
x = key_hash("testkey")
print type(x)
返回:
输入'numpy.int64'
关于如何将这一切限制在32位空间的任何提示?此外,我对有关如何执行某些操作(例如“MemberProxy_Hash + = MemberProxy_Hash * 0x62531965”)在计算哈希值时将适合32位的规范感到有点困惑。
编辑:
根据反馈,听起来似乎是正确的解决方案:
def key_hash(data):
# hash should be a 32-bit unsigned integer
hashed = 0
for char in data:
hashed += ((hashed << 19) + (hashed >> 13) + ord(char)) & 0xFFFFFFFF
return hashed
def server_hash(data):
# hash should be a 32-bit unsigned integer
hashed = 0
for char in data:
hashed += ((hashed << 19) + (hashed >> 13) + ord(char)) & 0xFFFFFFFF
hashed += (hashed * 0x62531965) & 0xFFFFFFFF
hashed = ((hashed << 21) + (hashed >> 11)) & 0xFFFFFFFF
return hashed
def hash_combination(key_hash, server_hash):
# hash should be a 32-bit unsigned integer
combined_hash = (key_hash ^ server_hash) & 0xFFFFFFFF
combined_hash += (combined_hash * 0x62531965) & 0xFFFFFFFF
return combined_hash
编辑#2:
另一个固定版本。
def rotate_left(x, n, maxbit=32):
# assumes 32 bit
x = x & (2 ** maxbit - 1)
return ((x << n) | (x >> (maxbit - n)))
def key_hash(data):
# hash should be a 32-bit unsigned integer
hashed = 0
for char in data:
hashed = (hashed + rotate_left(hashed, 19) + ord(char))
return hashed
def server_hash(data):
# hash should be a 32-bit unsigned integer
hashed = 0
for char in data:
hashed = (hashed + rotate_left(hashed, 19) + ord(char))
hashed = hashed + hashed * 0x62531965
hashed = rotate_left(hashed, 21)
return hashed
def hash_combination(key_hash, server_hash):
# hash should be a 32-bit unsigned integer
combined_hash = key_hash ^ server_hash
combined_hash = combined_hash + combined_hash * 0x62531965
return combined_hash & 0xFFFFFFFF
答案 0 :(得分:2)
不要打扰numpy uint32。只需使用标准Python int
即可。通过执行result &= 0xFFFFFFFF
删除不需要的高位来限制操作结果。
def key_hash(data):
# hash should be a 32-bit unsigned integer
hashed = 0
for char in data:
# hashed += ((hashed << 19) + ord(char)) & 0xFFFFFFFF
# the above is wrong; it's not masking the final addition.
hashed = (hashed + (hashed << 19) + ord(char)) & 0xFFFFFFFF
return hashed
你可以只进行一次最终屏蔽,但在长输入时会相当慢,因为中间hashed
会是一个相当大的数字。
顺便说一下,上面的哈希函数不是很好。 rot
中的rotl
表示旋转,而非移位。
你需要
# hashed += ((hashed << 19) + (hashed >> 13) + ord(char)) & 0xFFFFFFFF
# the above is wrong; it's not masking the final addition.
hashed = (hashed + (hashed << 19) + (hashed >> 13) + ord(char)) & 0xFFFFFFFF
修改 ...比较;这段代码:
def rotate_left(x, n, maxbit=32):
# assumes 32 bit
x = x & (2 ** maxbit - 1)
return ((x << n) | (x >> (maxbit - n)))
def key_hash(data):
# hash should be a 32-bit unsigned integer
hashed = 0
for char in data:
hashed = (hashed + rotate_left(hashed, 19) + ord(char))
return hashed
def khash(data):
h = 0
for c in data:
assert 0 <= h <= 0xFFFFFFFF
h = (h + (h << 19) + (h >> 13) + ord(c)) & 0xFFFFFFFF
assert 0 <= h <= 0xFFFFFFFF
return h
guff = "twas brillig and the slithy toves did whatever"
print "yours: %08X" % key_hash(guff)
print "mine : %08X" % khash(guff)
产生
yours: A20352DB4214FD
mine : DB4214FD
答案 1 :(得分:1)
以下对我有用,虽然可能有点单声道:
from numpy import uint32
def key_hash(data):
# hash should be a 32-bit unsigned integer
hashed = uint32()
for char in data:
hashed += hashed << uint32(19) + uint32(ord(char))
return hashed
x = key_hash("testkey")
print type(x)
问题在于数字被强制转向更多位而不是更少。