我用泡菜测试三种不同的方案; 0,1,2。
在我的测试中,我倾倒并加载了一个dict
个约270000(int
,int
)对和一个set
约560000 int
。< / p>
以下是我的测试代码(你可以安全地跳过我用来从数据库中获取数据的两个获取函数):
protocol = 0 # Tested 0, 1, and 2
print 'Protocol:', protocol
t0 = time.time()
sku2spu_dict = fetch_sku2spu_dict()
pid_set = fetch_valid_pids()
t1 = time.time()
print 'Time in sql:', t1 - t0
pickle.dump(sku2spu_dict, open('sku.pcike_dict', 'w'), protocol)
pickle.dump(pid_set, open('pid.picke_set', 'w'), protocol)
t2 = time.time()
print 'Time in dump:', t2 - t1
sku2spu_dict = pickle.load(open('sku.pcike_dict', 'r'))
pid_set = pickle.load(open('pid.picke_set', 'r'))
t3 = time.time()
print 'Time in load:', t3 - t2
以下是每个人花费的时间:
Protocol: 0
Time in dump: 31.3491470814
Time in load: 29.8991980553
Protocol: 1
Time in dump: 32.3191611767
Time in load: 20.6666529179
Protocol: 2
Time in dump: 94.2163629532
Time in load: 42.7647490501
令我惊讶的是,协议2比0和1差很多。
但是,转储文件大小是协议2中最小的,大约是协议0和1的一半。
在文档中,它说:
Python 2.3中引入了协议版本2。它提供了更有效的新式类型的酸洗。
对于new-style classes
的定义,它说:
任何继承自object的类。这包括所有内置类型,如list和dict
所以我希望协议2在转储和加载对象时更快。
任何人都知道为什么?
更新
用pickle
替换cPickle
后问题已解决。
现在load
和dump
使用协议2需要5秒和3秒,而协议0和1需要10秒以上。
答案 0 :(得分:2)
当文档谈到“新式类”时,它(可能)引用用户定义的新式类。如果您使用它们进行简单的基准测试,您可以看到协议2在转储协议时比协议0快两倍:
>>> import cPickle
>>> import timeit
>>> class MyObject(object):
... def __init__(self, val):
... self.val = val
... def method(self):
... print self.val
...
>>> timeit.timeit('cPickle.dumps(MyObject(100), 0)', 'from __main__ import cPickle, MyObject')
17.654622077941895
>>> timeit.timeit('cPickle.dumps(MyObject(100), 1)', 'from __main__ import cPickle, MyObject')
14.536609172821045
>>> timeit.timeit('cPickle.dumps(MyObject(100), 2)', 'from __main__ import cPickle, MyObject')
8.885567903518677
同时加载结果的速度提高了2倍:
>>> dumped = cPickle.dumps(MyObject(100), 0)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
4.6161839962005615
>>> dumped = cPickle.dumps(MyObject(100), 1)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
4.351701021194458
>>> dumped = cPickle.dumps(MyObject(100), 2)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
2.3936450481414795
在你的特殊情况下,它可能正好相反,但如果没有定义fetch_sku2spu_dict
等的代码我们就不能说什么。我唯一可以假设的是返回的值是dict
,但在这种情况下,协议2的速度提高了约6倍:
>>> mydict = dict(zip(range(100), range(100)))
>>> timeit.timeit('cPickle.dumps(mydict, 0)', 'from __main__ import cPickle, mydict')
46.335021018981934
>>> timeit.timeit('cPickle.dumps(mydict, 1)', 'from __main__ import cPickle, mydict')
7.913743019104004
>>> timeit.timeit('cPickle.dumps(mydict, 2)', 'from __main__ import cPickle, mydict')
7.798863172531128
加载速度提高了约2.5倍:
>>> dumped = cPickle.dumps(mydict, 0)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
32.81050395965576
>>> dumped = cPickle.dumps(mydict, 1)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
13.997781038284302
>>> dumped = cPickle.dumps(mydict, 2)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
14.006750106811523
另一方面,当使用模块的python版本时,我发现:
>>> mydict = dict(zip(range(100), range(100)))
>>> timeit.timeit('pickle.dumps(mydict,0)', 'from __main__ import pickle, mydict', number=10000)
2.9552500247955322
>>> timeit.timeit('pickle.dumps(mydict,1)', 'from __main__ import pickle, mydict', number=10000)
3.831756830215454
>>> timeit.timeit('pickle.dumps(mydict,2)', 'from __main__ import pickle, mydict', number=10000)
3.842888116836548
因此,使用协议1和2转储内置对象似乎比使用python版本的协议0慢。但是当加载对象时,协议0再次是三者中最慢的:
>>> dumped = pickle.dumps(mydict, 0)
>>> timeit.timeit('pickle.loads(dumped)', 'from __main__ import pickle, dumped', number=10000)
2.988792896270752
>>> dumped = pickle.dumps(mydict, 1)
>>> timeit.timeit('pickle.loads(dumped)', 'from __main__ import pickle, dumped', number=10000)
1.2793281078338623
>>> dumped = pickle.dumps(mydict, 2)
>>> timeit.timeit('pickle.loads(dumped)', 'from __main__ import pickle, dumped', number=10000)
1.5425071716308594
正如您在上面的迷你基准测试中所看到的,泡菜所需的时间取决于许多因素,从您腌制的物体类型到您使用的泡菜模块版本。如果没有进一步的信息,我们将无法解释为什么在您的情况下协议2会慢得多。