我想使自定义对象可以哈希(通过酸洗)。我可以找到Python 2.x的__hash__
算法(参见下面的代码),但很明显与不同于Python 3.2的哈希(我想知道为什么?)。有谁知道在Python 3.2中如何实现__hash__
?
#Version: Python 3.2
def c_mul(a, b):
#C type multiplication
return eval(hex((int(a) * b) & 0xFFFFFFFF)[:-1])
class hs:
#Python 2.x algorithm for hash from http://effbot.org/zone/python-hash.htm
def __hash__(self):
if not self:
return 0 # empty
value = ord(self[0]) << 7
for char in self:
value = c_mul(1000003, value) ^ ord(char)
value = value ^ len(self)
if value == -1:
value = -2
return value
def main():
s = ["PROBLEM", "PROBLEN", "PROBLEO", "PROBLEP"]#, "PROBLEQ", "PROBLER", "PROBLES"]
print("Python 3.2 hash() bild-in")
for c in s[:]: print("hash('", c, "')=", hex(hash(c)), end="\n")
print("\n")
print("Python 2.x type hash: __hash__()")
for c in s[:]: print("hs.__hash__('", c, "')=", hex(hs.__hash__(c)), end="\n")
if __name__ == "__main__":
main()
OUTPUT:
Python 3.2 hash() bild-in
hash(' PROBLEM ')= 0x7a8e675a
hash(' PROBLEN ')= 0x7a8e6759
hash(' PROBLEO ')= 0x7a8e6758
hash(' PROBLEP ')= 0x7a8e6747
Python 2.x type hash: __hash__()
hs.__hash__(' PROBLEM ')= 0xa638a41
hs.__hash__(' PROBLEN ')= 0xa638a42
hs.__hash__(' PROBLEO ')= 0xa638a43
hs.__hash__(' PROBLEP ')= 0xa638a5c
编辑:差异解释,对于Python 3.2“哈希值现在是新类型的值,Py_hash_t等等。”
Edit2 @Pih谢谢[link] http://svn.python.org/view/python/trunk/Objects/stringobject.c?view=markup
static long
1263 string_hash(PyStringObject *a)
1264 {
1265 register Py_ssize_t len;
1266 register unsigned char *p;
1267 register long x;
1268
1269 if (a->ob_shash != -1)
1270 return a->ob_shash;
1271 len = Py_SIZE(a);
1272 p = (unsigned char *) a->ob_sval;
1273 x = *p << 7;
1274 while (--len >= 0)
1275 x = (1000003*x) ^ *p++;
1276 x ^= Py_SIZE(a);
1277 if (x == -1)
1278 x = -2;
1279 a->ob_shash = x;
1280 return x;
1281 }
答案 0 :(得分:5)
为什么他们不同的答案写在那里:
哈希值现在是新值 type,Py_hash_t,定义为 与指针大小相同。 以前他们是长型, 在一些64位操作系统上 仍然只有32位长。
散列还考虑要计算的新值,看看
sys.hash_info
对于字符串,你可以查看http://svn.python.org/view/python/trunk/Objects/stringobject.c?view=markup第1263行string_hash(PyStringObject * a)
答案 1 :(得分:2)
我在源代码中查找了新函数(在unicodeobject.c中)并在Python中重建它。这是:
def my_hash(string):
x = ord(string[0]) << 7
for c in string:
x = (1000003 * x) ^ ord(c)
x ^= len(string)
needCorrection = x & (1 << 65)
x %= 2 ** 64
if needCorrection:
x = -~(-x ^ 0xFFFFFFFFFFFFFFFF)
if x == -1:
x = -2
return x
但这只是64位。现在,当数字变为负数时,修正Python的奇怪行为。 (你最好不要过多考虑这个问题。)