通过猴子修补DEFAULT_PROTOCOL提高了pickle.dumps的性能?

时间:2017-12-19 18:06:12

标签: python python-3.x pickle

我注意到它在速度方面可以产生很大的不同, 如果您通过参数指定pickle.dumps中使用的协议,或者如果您指定 猴子补丁pickle.DEFAULT_PROTOCOL用于所需的协议版本。

在Python 3.6上,pickle.DEFAULT_PROTOCOL是3和 pickle.HIGHEST_PROTOCOL是4。

对于长度达到一定长度的物体,设置似乎更快 DEFAULT_PROTOCOL为4而不是传递protocol=4作为参数。

在我的测试中,例如,将pickle.DEFAULT_PROTOCOL设置为4并进行酸洗 通过调用pickle.dumps(packet_list_1)获得长度为1的列表需要481 ns,而使用pickle.dumps(packet_list_1, protocol=4)进行调用需要733 ns,对于明确传递协议而言,速度约为52%,而不是回退到默认值(已设置)至4之前)。

  """
  (stackoverflow insists this to be formatted as code:)

  pickle.DEFAULT_PROTOCOL = 4
  pickle.dumps(packet) vs pickle.dumps(packet, protocol=4):

  (stackoverflow insists this to be formatted as code:)
  For a list with length 1 it's 481ns vs 733ns (~52% penalty).
  For a list with length 10 it's 763ns vs 999ns (~30% penalty).
  For a list with length 100 it's 2.99 µs vs 3.21 µs (~7% penalty).
  For a list with length 1000 it's 25.8 µs vs 26.2 µs (~1.5% penalty).
  For a list with length 1_000_000 it's 32 ms vs 32.4 ms (~1.13% penalty).
  """

我发现了实例,列表,字符串和数组的这种行为 到目前为止我所测试的一切对象大小会减弱效果。

对于dicts,我注意到效果在某一点转向相反,所以 如果长度为10 ** 6 dict(具有唯一的整数值),则显式更快 传递protocol = 4作为参数(269ms),而不是依赖于默认设置为4(286ms)。

 """
 pickle.DEFAULT_PROTOCOL = 4 
 pickle.dumps(packet) vs pickle.dumps(packet, protocol=4):

 For a dict with length 1 it's 589 ns vs 811 ns (~38% penalty).
 For a dict with length 10 it's 1.59 µs vs 1.81 µs (~14% penalty).
 For a dict with length 100 it's 13.2 µs vs 12.9 µs (~2,3% penalty).
 For a dict with length 1000 it's 128 µs vs 129 µs (~0.8% penalty).
 For a dict with length 1_000_000 it's 306 ms vs 283 ms (~7.5% improvement).
 """

瞥见泡菜来源,没有任何东西可以引起我的注意 这种变化。

这种意外行为如何解释?

是否有任何警告可以设置pickle.DEFAULT_PROTOCOL而不是传递 协议作为参数,以利用提高的速度?

(与IPython同步,在Python 3.6.3,IPython 6.2.1,Windows 7上的时间魔术)

一些示例代码转储:

# instances -------------------------------------------------------------
class Dummy: pass

dummy = Dummy()

pickle.DEFAULT_PROTOCOL = 3

"""
>>> %timeit pickle.dumps(dummy)
5.8 µs ± 33.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit pickle.dumps(dummy, protocol=4)
6.18 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
%timeit pickle.dumps(dummy)
5.74 µs ± 18.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit pickle.dumps(dummy, protocol=4)
6.24 µs ± 26.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""

# lists -------------------------------------------------------------
packet_list_1 = [*range(1)]

pickle.DEFAULT_PROTOCOL = 3
"""
>>>%timeit pickle.dumps(packet_list_1)
476 ns ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_1, protocol=4)
730 ns ± 2.22 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_1)
481 ns ± 2.12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_1, protocol=4)
733 ns ± 2.94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
# --------------------------
packet_list_10 = [*range(10)]

pickle.DEFAULT_PROTOCOL = 3

"""
>>>%timeit pickle.dumps(packet_list_10)
714 ns ± 3.05 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_10, protocol=4)
978 ns ± 24.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_10)
763 ns ± 3.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_10, protocol=4)
999 ns ± 8.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
# --------------------------
packet_list_100 = [*range(100)]

pickle.DEFAULT_PROTOCOL = 3

"""
>>>%timeit pickle.dumps(packet_list_100)
2.96 µs ± 5.16 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>>%timeit pickle.dumps(packet_list_100, protocol=4)
3.22 µs ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_100)
2.99 µs ± 18.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>>%timeit pickle.dumps(packet_list_100, protocol=4)
3.21 µs ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""
# --------------------------
packet_list_1000 = [*range(1000)]

pickle.DEFAULT_PROTOCOL = 3

"""
>>>%timeit pickle.dumps(packet_list_1000)
26 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>>%timeit pickle.dumps(packet_list_1000, protocol=4)
26.4 µs ± 93.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_1000)
25.8 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>>%timeit pickle.dumps(packet_list_1000, protocol=4)
26.2 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
"""
# --------------------------
packet_list_1m = [*range(10**6)]

pickle.DEFAULT_PROTOCOL = 3

"""
>>>%timeit pickle.dumps(packet_list_1m)
32 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>>%timeit pickle.dumps(packet_list_1m, protocol=4)
32.3 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_1m)
32 ms ± 52.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>>%timeit pickle.dumps(packet_list_1m, protocol=4)
32.4 ms ± 466 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
"""

1 个答案:

答案 0 :(得分:2)

让我们通过返回值重新组织您的%timeit结果

| DEFAULT_PROTOCOL | call                                    | %timeit           | returns                                                                                                                      |
|------------------+-----------------------------------------+-------------------+------------------------------------------------------------------------------------------------------------------------------|
|                3 | pickle.dumps(dummy)                     | 5.8 µs ± 33.5 ns  | b'\x80\x03c__main__\nDummy\nq\x00)\x81q\x01.'                                                                                |
|                4 | pickle.dumps(dummy)                     | 5.74 µs ± 18.8 ns | b'\x80\x03c__main__\nDummy\nq\x00)\x81q\x01.'                                                                                |
|                3 | pickle.dumps(dummy, protocol=4)         | 6.18 µs ± 10.4 ns | b'\x80\x04\x95\x1b\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x05Dummy\x94\x93\x94)}\x94\x92\x94.'                  |
|                4 | pickle.dumps(dummy, protocol=4)         | 6.24 µs ± 26.7 ns | b'\x80\x04\x95\x1b\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x05Dummy\x94\x93\x94)}\x94\x92\x94.'                  |
|                3 | pickle.dumps(packet_list_1)             | 476 ns ± 1.01 ns  | b'\x80\x03]q\x00cbuiltins\nrange\nq\x01K\x00K\x01K\x01\x87q\x02Rq\x03a.'                                                     |
|                4 | pickle.dumps(packet_list_1)             | 481 ns ± 2.12 ns  | b'\x80\x03]q\x00cbuiltins\nrange\nq\x01K\x00K\x01K\x01\x87q\x02Rq\x03a.'                                                     |
|                3 | pickle.dumps(packet_list_1, protocol=4) | 730 ns ± 2.22 ns  | b'\x80\x04\x95#\x00\x00\x00\x00\x00\x00\x00]\x94\x8c\x08builtins\x94\x8c\x05range\x94\x93\x94K\x00K\x01K\x01\x87\x94R\x94a.' |
|                4 | pickle.dumps(packet_list_1, protocol=4) | 733 ns ± 2.94 ns  | b'\x80\x04\x95#\x00\x00\x00\x00\x00\x00\x00]\x94\x8c\x08builtins\x94\x8c\x05range\x94\x93\x94K\x00K\x01K\x01\x87\x94R\x94a.' |

注意当我们将具有相同返回值的调用配对时,%timeit结果如何很好地对应。

如您所见,pickle.DEFAULT_PROTOCOL的值对pickle.dumps返回的值没有影响。 如果未指定protocol参数,则无论pickle.DEFAULT_PROTOCOL的值是什么,默认协议为3。

reason is here

# Use the faster _pickle if possible
try:
    from _pickle import (
        PickleError,
        PicklingError,
        UnpicklingError,
        Pickler,
        Unpickler,
        dump,
        dumps,
        load,
        loads
    )
except ImportError:
    Pickler, Unpickler = _Pickler, _Unpickler
    dump, dumps, load, loads = _dump, _dumps, _load, _loads

如果成功导入了pickle(pickle模块的编译版本),pickle.dumps模块将_pickle.dumps设置为_pickle_pickle模块默认使用protocol=3。仅当Python无法导入_pickledumps设置为the Python version

def _dumps(obj, protocol=None, *, fix_imports=True):
    f = io.BytesIO()
    _Pickler(f, protocol, fix_imports=fix_imports).dump(obj)
    res = f.getvalue()
    assert isinstance(res, bytes_types)
    return res

只有Python版本_dumpspickle.DEFAULT_PROTOCOL的值影响:

In [68]: pickle.DEFAULT_PROTOCOL = 3

In [70]: pickle._dumps(dummy)
Out[70]: b'\x80\x03c__main__\nDummy\nq\x00)\x81q\x01.'

In [71]: pickle.DEFAULT_PROTOCOL = 4

In [72]: pickle._dumps(dummy)
Out[72]: b'\x80\x04\x95\x1b\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x05Dummy\x94\x93\x94)}\x94\x92\x94.'