Question

In [30]: import numpy as np

In [31]: d = np.dtype(np.float64)

In [32]: d
Out[32]: dtype('float64')

In [33]: d == np.float64
Out[33]: True

In [34]: hash(np.float64)
Out[34]: -9223372036575774449

In [35]: hash(d)
Out[35]: 880835502155208439

为什么这些dtypes比较相同但散列不同？

请注意，Python确实承诺：

唯一需要的属性是比较相等的对象相同的哈希值......

我解决此问题的方法是在所有内容上调用np.dtype，之后哈希值和比较一致。

Answer 1

正如tttthomasssss所述，type和np.float64的{{1}}（类）不同。它们是不同的东西：

类型In [435]: type(np.float64) Out[435]: type表示（通常）它是一个函数，因此它可以用作：

type

创建数字对象。实际上，它看起来更像是一个类定义。但由于In [436]: np.float64(0) Out[436]: 0.0 In [437]: type(_) Out[437]: numpy.float64使用了大量已编译的代码，并且numpy使用了自己的ndarray，因此如果跨越该行，我不会感到惊讶。

__new__

我认为这会是In [438]: np.float64.__hash__?? Type: wrapper_descriptor String Form:<slot wrapper '__hash__' of 'float' objects> Docstring: x.__hash__() <==> hash(x)，但实际上它可能是该类型对象的哈希，例如hash(np.float64)。在这种情况下，hash(np.float64(0))只使用默认的hash(np.float64)方法。

转到type.__hash__：

dtype

In [439]: d=np.dtype(np.float64) In [440]: type(d) Out[440]: numpy.dtype不是函数或类：

看起来In [441]: d(0) ... TypeError: 'numpy.dtype' object is not callable In [442]: d.__hash__?? Type: method-wrapper String Form:<method-wrapper '__hash__' of numpy.dtype object at 0xb60f8a60> Docstring: x.__hash__() <==> hash(x)没有定义任何特殊的np.dtype方法，它只是继承自__hash__。

进一步说明object和float64之间的区别，看一下类继承栈

所以In [443]: np.float64.__mro__ Out[443]: (numpy.float64, numpy.floating, numpy.inexact, numpy.number, numpy.generic, float, object) In [444]: d.__mro__ ... AttributeError: 'numpy.dtype' object has no attribute '__mro__' In [445]: np.dtype.__mro__ Out[445]: (numpy.dtype, object)也没有定义哈希，它只是继承自np.float64。 float没有d，因为它是一个对象，而不是一个类。

__mro__有足够的编译代码，并且有很长的历史记录，你不能指望总是应用Python文档。

numpy和np.dtype显然有np.float64个方法可以让它们相互比较，但是__eq__开发人员没有付出任何努力来确保{ {1}}方法符合。很可能是因为他们不需要使用它们作为字典键。

我从未见过如下代码：

numpy

Answer 2

它们不是一回事，而np.float64是type，d是numpy.dtype的实例，因此它们哈希到不同的值，但是d以相同方式创建的所有实例将散列到相同的值，因为它们是相同的（这当然不一定意味着它们指向相同的内存位置）。

修改

鉴于您的上述代码，您可以尝试以下方法：

In [72]: type(d) Out[72]: numpy.dtype In [74]: type(np.float64) Out[74]: type

它向您显示两者的类型不同，因此将散列为不同的值。显示以下示例可以显示numpy.dtype的不同实例：

In [77]: import copy In [78]: dd = copy.deepcopy(d) # Try copying In [79]: dd Out[79]: dtype('float64') In [80]: hash(dd) Out[80]: -6584369718629170405 In [81]: hash(d) # original d Out[81]: -6584369718629170405 In [82]: ddd = np.dtype(np.float64) # new instance In [83]: hash(ddd) Out[83]: -6584369718629170405 # If using CPython, id returns the address in memory (see: https://docs.python.org/3/library/functions.html#id) In [84]: id(ddd) Out[84]: 4376165768 In [85]: id(dd) Out[85]: 4459249168 In [86]: id(d) Out[86]: 4376165768

很高兴看到ddd（实例的创建方式与d相同），而d本身在内存中共享同一个对象，但dd（复制的对象）使用不同的地址。

根据上面的哈希值，等式检查会按照您的预期进行评估：

In [87]: dd == np.float64 Out[87]: True In [88]: d == np.float64 Out[88]: True In [89]: ddd == np.float64 Out[89]: True In [90]: d == dd Out[90]: True In [91]: d == ddd Out[91]: True In [92]: dd == ddd Out[92]: True

Answer 3

他们不应该以这种方式行事，但__eq__和__hash__ numpy.dtype对象在基本上无法修复的设计级别上被破坏。我将从njsmith对dtype-related bug report的评论中大肆宣传这个答案。

np.float64实际上不是dtype。它是一种普通意义上的Python类型系统。具体来说，如果从float64 dtype数组中检索标量，则np.float64是生成的标量的类型。

np.dtype(np.float64)是一个dtype，是numpy.dtype的一个实例。 dtypes是NumPy如何记录NumPy数组内容的结构。它们对structured arrays特别重要，{{3}}可能具有非常复杂的dtypes。虽然普通的Python类型可以填充dtypes的大部分角色，但是为新的结构化数组动态创建新类型会非常尴尬，而且在类型级统一之前的几天可能是不可能的。

numpy.dtype实现__eq__基本上是这样的：

def __eq__(self, other):
    if isinstance(other, numpy.dtype):
        return regular_comparison(self, other)
    return self == numpy.dtype(other)

非常破碎。在其他问题中，它不是传递性的，当它应该返回TypeError时它会引发NotImplemented，并且由于dtype强制如何起作用，它的输出有时很奇怪：

>>> x = numpy.dtype(numpy.float64)
>>> x == None
True

numpy.dtype.__hash__没有任何好转。它没有尝试与所有其他类型__hash__接受的numpy.dtype.__eq__方法保持一致（并且有许多不兼容的类型要处理，它怎么可能？）。哎呀，它甚至不应该存在，因为dtype对象是可变的！不仅可以像模块或文件对象一样可变，因为__eq__和__hash__可以通过身份工作。 dtype对象是可变的，实际上会改变它们的哈希值：

>>> x = numpy.dtype([('f1', float)])
>>> hash(x)
-405377605
>>> x.names = ['f2']
>>> hash(x)
1908240630

当您尝试比较d == np.float64时，d.__eq__会在np.float64中构建一个dtype，并发现d == np.dtype(np.float64)为True。但是，当你使用哈希时，np.float64使用常规（基于身份）哈希来表示类型对象，d使用哈希来表示dtype对象。通常，不同类型的相等对象应具有相等的哈希值，但dtype实现并不关心它。

不幸的是，在不破坏人们依赖的API的情况下，无法解决dtype __eq__和__hash__的问题。人们指望x.dtype == 'float64'或x.dtype == np.float64之类的东西，修复dtypes会破坏它。

Answer 4

这是因为您正在针对type对象散列dtype。

虽然值比较相等（d == np.float64的证据，但它们的类型不同：

print type(d)
print type(np.float64)

可生产

＆lt; type'numpy.dtype'＆gt;

＆lt; type'type'＆gt;

根据Python docs：

hash（对象）

返回对象的哈希值（如果有）。哈希值是整数。它们用于在字典查找期间快速比较字典键。比较相等的数字值具有相同的哈希值（即使它们具有不同的类型，如1和1.0的情况）。

由于dtype不是数字类型，因此无法保证此类和对象将产生与比较相等的type相同的哈希值。

编辑：来自Python 3.5 docs：

object.__hash__(self)

由内置函数hash（）调用，以及对散列集合成员的操作，包括set，frozenset和dict。哈希（）应该返回一个整数。唯一需要的属性是比较相等的对象具有相同的哈希值;建议以某种方式将对象组件的哈希值混合在一起（例如使用exclusive或），这些哈希值也是对象比较中的一部分。

这似乎意味着hash(d) == hash(np.float64)应该在您的案例中返回True。

我确实注意到在说明之后有一个说明：

hash()将从对象的自定义哈希（）方法返回的值截断为Py_ssize_t的大小。这通常是64位构建上的8个字节和32位构建上的4个字节。

但是，我无法确定从哈希函数返回的对象的大小实际上是不同的;它们看起来是一样的（我使用sys.getsizeof）

为什么这些dtypes比较相同但哈希不同？

4 个答案: