在IPython中显示字典会重新计算哈希值

时间:2017-01-01 02:39:57

标签: python dictionary hash ipython

扼杀,如果我在IPython中显示字典,它似乎会重新计算键的哈希值。这种行为在普通的python解释器中不会发生,我想知道这可能是什么原因。

一个例子:

class Fun(object):
    def __init__(self, value):
        self._value = value

    def __hash__(self):
        print('hashing')
        return hash(self._value)

    def __eq__(self, other):
        if isinstance(other, Fun):
            return self._value == other._value
        else:
            return self._value == other

    def __repr__(self):
        return '{}({})'.format(self.__class__.__name__, self._value)

创建字典时,显然需要hash

In [2]: dict1 = {Fun(10): 5, Fun(11): 5}
hashing
hashing

但是当我稍后显示字典时,它让我感到惊讶:

In [3]: dict1
Out[3]: hashing
hashing
{Fun(11): 5, Fun(10): 5}

如果我使用repritems,则不会发生这种情况:

In [4]: dict1.items()
Out[4]: [(Fun(10), 5), (Fun(11), 5)]

In [5]: repr(dict1)
Out[5]: '{Fun(10): 5, Fun(11): 5}'

通常情况下我不在乎,但我正在研究一个有着非常昂贵的hash方法的类的性能问题,而且为什么显示dict1似乎不合理(特别反对repr(dict1))应该重新计算密钥的hash

但问题不仅仅是为什么(即使是那些真正有趣的我),我也会对如何禁用非常感兴趣。我使用的是IPython 5.1.0。

2 个答案:

答案 0 :(得分:4)

有趣。我在散列函数中添加了一个pdb.set_trace(),并尝试打印dict1。进入pdb后,我使用“where”命令查看堆栈:

In [16]: dict1
Out[16]: > <ipython-input-14-01f77f64262f>(6)__hash__()
-> print('hashing')
(Pdb) where
  /usr/local/virtualenvs/lab/bin/ipython(11)<module>()
-> sys.exit(start_ipython())
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/__init__.py(119)start_ipython()
-> return launch_new_instance(argv=argv, **kwargs)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/traitlets/config/application.py(596)launch_instance()
-> app.start()
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/terminal/ipapp.py(344)start()
-> self.shell.mainloop()
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/terminal/interactiveshell.py(550)mainloop()
-> self.interact(display_banner=display_banner)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/terminal/interactiveshell.py(674)interact()
-> self.run_cell(source_raw, store_history=True)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/interactiveshell.py(2723)run_cell()
-> interactivity=interactivity, compiler=compiler, result=result)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/interactiveshell.py(2831)run_ast_nodes()
-> if self.run_code(code, result):
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/interactiveshell.py(2885)run_code()
-> exec(code_obj, self.user_global_ns, self.user_ns)
  <ipython-input-16-8239e7494a4a>(1)<module>()
-> dict1
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/displayhook.py(246)__call__()
-> format_dict, md_dict = self.compute_format_data(result)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/displayhook.py(152)compute_format_data()
-> return self.shell.display_formatter.format(result)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/formatters.py(177)format()
-> data = formatter(obj)
  <decorator-gen-10>(2)__call__()
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/formatters.py(222)catch_format_error()
-> r = method(self, *args, **kwargs)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/formatters.py(699)__call__()
-> printer.pretty(obj)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/lib/pretty.py(368)pretty()
-> return self.type_pprinters[cls](obj, self, cycle)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/lib/pretty.py(623)inner()
-> p.pretty(obj[key])
> <ipython-input-14-01f77f64262f>(6)__hash__()
-> print('hashing')

看起来ipython shell正在努力打印结果。 pretty.py代码是:

for idx, key in p._enumerate(keys):
    if idx:
        p.text(',')
        p.breakable()
    p.pretty(key)
    p.text(': ')
    p.pretty(obj[key])

查找obj[key]涉及再次散列密钥。

这可以避免吗?不确定! ¯\ _(ツ)_ /¯

答案 1 :(得分:0)

我怀疑它与将字典或副本放在Out字典中有关。显示或引用字典的其他方法不执行此操作

In [7]: d
Out[7]: hashing
hashing
{Fun(10): 5, Fun(11): 5}
In [8]: d;
In [9]: d
Out[9]: hashing
hashing
{Fun(10): 5, Fun(11): 5}
In [10]: d;
In [11]: print(d)
{Fun(10): 5, Fun(11): 5}
In [12]: str(d)
Out[12]: '{Fun(10): 5, Fun(11): 5}'
In [13]: repr(d)
Out[13]: '{Fun(10): 5, Fun(11): 5}'

In [21]: id(d)
Out[21]: 2977840716
In [22]: id(Out[7])
Out[22]: 2977840716

这可能只是另一种看待漂亮印刷问题的方式。

深层复制重做,浅层没有:

In [28]: {k:v for k,v in d.items()};
hashing
hashing
In [29]: d1 = {}
In [30]: d1.update(d)
In [32]: import copy
In [33]: copy.copy(d);
In [34]: copy.deepcopy(d);
hashing
hashing

使用更大的字典,例如db={Fun(i):i for i in range(15)},Ipython显示是多行的。有趣的是,pprint.pprint(db)打印多行而不重复(但使用不同的键顺序)。