为什么使用pickle.HIGHEST_PROTOCOL转储和加载需要更长的时间?

时间:2012-12-01 01:42:17

标签: python pickle

我用泡菜测试三种不同的方案; 0,1,2。

在我的测试中,我倾倒并加载了一个dict个约270000(intint)对和一个set约560000 int。< / p>

以下是我的测试代码(你可以安全地跳过我用来从数据库中获取数据的两个获取函数):

protocol = 0 # Tested 0, 1, and 2
print 'Protocol:', protocol
t0 = time.time()
sku2spu_dict = fetch_sku2spu_dict()
pid_set = fetch_valid_pids()
t1 = time.time()
print 'Time in sql:', t1 - t0
pickle.dump(sku2spu_dict, open('sku.pcike_dict', 'w'), protocol)
pickle.dump(pid_set, open('pid.picke_set', 'w'), protocol)
t2 = time.time()
print 'Time in dump:', t2 - t1
sku2spu_dict = pickle.load(open('sku.pcike_dict', 'r'))
pid_set = pickle.load(open('pid.picke_set', 'r'))
t3 = time.time()
print 'Time in load:', t3 - t2

以下是每个人花费的时间:

Protocol: 0
Time in dump: 31.3491470814
Time in load: 29.8991980553

Protocol: 1
Time in dump: 32.3191611767
Time in load: 20.6666529179

Protocol: 2
Time in dump: 94.2163629532
Time in load: 42.7647490501

令我惊讶的是,协议2比0和1差很多。

但是,转储文件大小是协议2中最小的,大约是协议0和1的一半。

在文档中,它说:

  

Python 2.3中引入了协议版本2。它提供了更有效的新式类型的酸洗。

对于new-style classes的定义,它说:

  

任何继承自object的类。这包括所有内置类型,如list和dict

所以我希望协议2在转储和加载对象时更快。

任何人都知道为什么?

更新

pickle替换cPickle后问题已解决。

现在loaddump使用协议2需要5秒和3秒,而协议0和1需要10秒以上。

1 个答案:

答案 0 :(得分:2)

当文档谈到“新式类”时,它(可能)引用用户定义的新式类。如果您使用它们进行简单的基准测试,您可以看到协议2在转储协议时比协议0快两倍:

>>> import cPickle
>>> import timeit
>>> class MyObject(object):
...     def __init__(self, val):
...             self.val = val
...     def method(self):
...             print self.val
... 
>>> timeit.timeit('cPickle.dumps(MyObject(100), 0)', 'from __main__ import cPickle, MyObject')
17.654622077941895
>>> timeit.timeit('cPickle.dumps(MyObject(100), 1)', 'from __main__ import cPickle, MyObject')
14.536609172821045
>>> timeit.timeit('cPickle.dumps(MyObject(100), 2)', 'from __main__ import cPickle, MyObject')
8.885567903518677

同时加载结果的速度提高了2倍:

>>> dumped = cPickle.dumps(MyObject(100), 0)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
4.6161839962005615
>>> dumped = cPickle.dumps(MyObject(100), 1)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
4.351701021194458
>>> dumped = cPickle.dumps(MyObject(100), 2)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
2.3936450481414795

在你的特殊情况下,它可能正好相反,但如果没有定义fetch_sku2spu_dict等的代码我们就不能说什么。我唯一可以假设的是返回的值是dict,但在这种情况下,协议2的速度提高了约6倍:

>>> mydict = dict(zip(range(100), range(100)))
>>> timeit.timeit('cPickle.dumps(mydict, 0)', 'from __main__ import cPickle, mydict')
46.335021018981934
>>> timeit.timeit('cPickle.dumps(mydict, 1)', 'from __main__ import cPickle, mydict')
7.913743019104004
>>> timeit.timeit('cPickle.dumps(mydict, 2)', 'from __main__ import cPickle, mydict')
7.798863172531128

加载速度提高了约2.5倍:

>>> dumped = cPickle.dumps(mydict, 0)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
32.81050395965576
>>> dumped = cPickle.dumps(mydict, 1)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
13.997781038284302
>>> dumped = cPickle.dumps(mydict, 2)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
14.006750106811523

另一方面,当使用模块的python版本时,我发现:

>>> mydict = dict(zip(range(100), range(100)))
>>> timeit.timeit('pickle.dumps(mydict,0)', 'from __main__ import pickle, mydict', number=10000)
2.9552500247955322
>>> timeit.timeit('pickle.dumps(mydict,1)', 'from __main__ import pickle, mydict', number=10000)
3.831756830215454
>>> timeit.timeit('pickle.dumps(mydict,2)', 'from __main__ import pickle, mydict', number=10000)
3.842888116836548

因此,使用协议1和2转储内置对象似乎比使用python版本的协议0慢。但是当加载对象时,协议0再次是三者中最慢的:

>>> dumped = pickle.dumps(mydict, 0)
>>> timeit.timeit('pickle.loads(dumped)', 'from __main__ import pickle, dumped', number=10000)
2.988792896270752
>>> dumped = pickle.dumps(mydict, 1)
>>> timeit.timeit('pickle.loads(dumped)', 'from __main__ import pickle, dumped', number=10000)
1.2793281078338623
>>> dumped = pickle.dumps(mydict, 2)
>>> timeit.timeit('pickle.loads(dumped)', 'from __main__ import pickle, dumped', number=10000)
1.5425071716308594

正如您在上面的迷你基准测试中所看到的,泡菜所需的时间取决于许多因素,从您腌制的物体类型到您使用的泡菜模块版本。如果没有进一步的信息,我们将无法解释为什么在您的情况下协议2会慢得多。