如何在Python 3.2中实现__hash__?

时间:2011-05-15 11:13:42

标签: python algorithm hash

我想使自定义对象可以哈希(通过酸洗)。我可以找到Python 2.x的__hash__算法(参见下面的代码),但很明显不同于Python 3.2的哈希(我想知道为什么?)。有谁知道在Python 3.2中如何实现__hash__

#Version: Python 3.2

def c_mul(a, b):
    #C type multiplication
    return eval(hex((int(a) * b) & 0xFFFFFFFF)[:-1])

class hs:
    #Python 2.x algorithm for hash from http://effbot.org/zone/python-hash.htm
    def __hash__(self):
        if not self:
            return 0 # empty
        value = ord(self[0]) << 7
        for char in self:
            value = c_mul(1000003, value) ^ ord(char)
        value = value ^ len(self)
        if value == -1:
            value = -2
        return value


def main():
    s = ["PROBLEM", "PROBLEN", "PROBLEO", "PROBLEP"]#, "PROBLEQ", "PROBLER", "PROBLES"]
    print("Python 3.2 hash() bild-in")
    for c in s[:]: print("hash('", c, "')=", hex(hash(c)),  end="\n")
    print("\n")
    print("Python 2.x type hash: __hash__()")
    for c in s[:]: print("hs.__hash__('", c, "')=", hex(hs.__hash__(c)),  end="\n")


if __name__ == "__main__":
    main()

OUTPUT:
Python 3.2 hash() bild-in
hash(' PROBLEM ')= 0x7a8e675a
hash(' PROBLEN ')= 0x7a8e6759
hash(' PROBLEO ')= 0x7a8e6758
hash(' PROBLEP ')= 0x7a8e6747


Python 2.x type hash: __hash__()
hs.__hash__(' PROBLEM ')= 0xa638a41
hs.__hash__(' PROBLEN ')= 0xa638a42
hs.__hash__(' PROBLEO ')= 0xa638a43
hs.__hash__(' PROBLEP ')= 0xa638a5c

编辑:差异解释,对于Python 3.2“哈希值现在是新类型的值,Py_hash_t等等。”

Edit2 @Pih谢谢[link] http://svn.python.org/view/python/trunk/Objects/stringobject.c?view=markup

static long
1263    string_hash(PyStringObject *a)
1264    {
1265        register Py_ssize_t len;
1266        register unsigned char *p;
1267        register long x;
1268    
1269        if (a->ob_shash != -1)
1270            return a->ob_shash;
1271        len = Py_SIZE(a);
1272        p = (unsigned char *) a->ob_sval;
1273        x = *p << 7;
1274        while (--len >= 0)
1275            x = (1000003*x) ^ *p++;
1276        x ^= Py_SIZE(a);
1277        if (x == -1)
1278            x = -2;
1279        a->ob_shash = x;
1280        return x;
1281    }

2 个答案:

答案 0 :(得分:5)

为什么他们不同的答案写在那里:

  

哈希值现在是新值   type,Py_hash_t,定义为   与指针大小相同。   以前他们是长型,   在一些64位操作系统上   仍然只有32位长。

散列还考虑要计算的新值,看看

 sys.hash_info 

对于字符串,你可以查看http://svn.python.org/view/python/trunk/Objects/stringobject.c?view=markup第1263行string_hash(PyStringObject * a)

答案 1 :(得分:2)

我在源代码中查找了新函数(在unicodeobject.c中)并在Python中重建它。这是:

def my_hash(string):
    x = ord(string[0]) << 7
    for c in string:
        x = (1000003 * x) ^ ord(c)
    x ^= len(string)
    needCorrection =  x & (1 << 65)
    x %= 2 ** 64
    if needCorrection:
        x = -~(-x ^ 0xFFFFFFFFFFFFFFFF)
    if x == -1:
        x = -2
    return x

但这只是64位。现在,当数字变为负数时,修正Python的奇怪行为。 (你最好不要过多考虑这个问题。)