为什么使用dict而不是set来测试包含更快?

时间:2016-11-16 13:10:00

标签: python python-3.x dictionary set

我知道引擎盖下,Python集和Python dict非常相似。阅读他们的反思来源 - dictset - 很明显他们正在进行相同的查找。在阅读this answer时,我决定通过黑客攻击以下模型来测试作者声称“设置查找比dict查找更快”:

from timeit import timeit
import random

universe = range(1,100000)
keys = random.sample(universe, 50000)
lookups = random.sample(universe, 50000)
dict_set = dict((k,True) for k in keys)
set_set = set(keys)

def dict_lookup():
    for l in lookups:
        l in dict_set

def set_lookup():
    for l in lookups:
        l in set_set

if __name__ == '__main__':
    set_victories = 0
    dict_victories = 0
    for i in range(100):
        dict_time = timeit('dict_lookup()', setup="from __main__ import dict_lookup", number=10000)
        set_time = timeit('set_lookup()', setup="from __main__ import set_lookup", number=10000)
        print("dict time: {}".format(dict_time))
        print("set time:  {}".format(set_time))
        if set_time < dict_time:
            set_victories += 1
        else:
            dict_victories += 1
    print("Sets were faster in  {} trials".format(set_victories))
    print("Dicts were faster in {} trials".format(dict_victories))

预期的结果是,在给定实现的情况下,set lookup和dict查找将具有无法区分的性能。我实际发现的是以下最终结果:

$ python3 --version
Python 3.4.5
$ python3 sets-vs-dicts.py
<snip - see below for full output>
Sets were faster in  2 trials
Dicts were faster in 98 trials

因此,dict实际上始终更快而不是集合。当然,我并不是说我们应该全都放弃并使用更快的字典,因为集合使得程序员的意图更加清晰,并且考虑到测试的大小,差异很小。但是,我觉得这个结果非常奇怪。这是怎么回事?

如果你很好奇,完整输出如下:

$ python3 set-vs-dict.py
dict time: 57.754860900342464
set time:  56.8056653002277
dict time: 50.8890880998224
set time:  50.642351899296045
dict time: 49.936297399923205
set time:  50.66272980067879
dict time: 49.92973940074444
set time:  50.65518939960748
dict time: 49.949383799917996
set time:  50.66877659969032
dict time: 49.93578719999641
set time:  50.64872649963945
dict time: 49.96432110015303
set time:  50.676835800521076
dict time: 49.95099350064993
set time:  50.64867010060698
dict time: 49.98275039996952
set time:  50.648987299762666
dict time: 49.92164439987391
set time:  50.66931669972837
dict time: 49.98953749984503
set time:  50.652459900826216
dict time: 49.95234560035169
set time:  50.65124330017716
dict time: 49.98174169939011
set time:  50.6712632002309
dict time: 49.93824000004679
set time:  50.65437529981136
dict time: 49.95089349988848
set time:  50.65370349958539
dict time: 49.963413699530065
set time:  50.65550949983299
dict time: 49.955208600498736
set time:  50.66121090017259
dict time: 49.94347499962896
set time:  50.64449250046164
dict time: 49.95420549996197
set time:  50.66687630023807
dict time: 49.92143050022423
set time:  50.64667259994894
dict time: 50.05037229973823
set time:  50.67966340016574
dict time: 49.93846719991416
set time:  50.64651320036501
dict time: 49.921281000599265
set time:  50.67906459979713
dict time: 49.942994699813426
set time:  50.65166569966823
dict time: 49.94313340075314
set time:  50.656177499331534
dict time: 49.94610709976405
set time:  50.65122799947858
dict time: 49.93874369934201
set time:  50.661101600155234
dict time: 49.94996269978583
set time:  50.63938449975103
dict time: 49.9602530002594
set time:  50.65474760066718
dict time: 49.91891669947654
set time:  50.663624899461865
dict time: 49.959330099634826
set time:  50.653377699665725
dict time: 49.98555530048907
set time:  50.64655719976872
dict time: 49.945239200256765
set time:  50.65128379967064
dict time: 49.95342260040343
set time:  50.65899199992418
dict time: 49.92802210059017
set time:  50.67100259941071
dict time: 49.942902400158346
set time:  50.74889140017331
dict time: 49.994800799526274
set time:  50.731577299535275
dict time: 49.98310230020434
set time:  50.747778999619186
dict time: 49.99376400001347
set time:  50.73122859932482
dict time: 50.00640409998596
set time:  50.68737949989736
dict time: 49.94556000083685
set time:  50.722481600008905
dict time: 49.98192979954183
set time:  50.72525530029088
dict time: 49.99698970001191
set time:  50.736096899956465
dict time: 49.94320739991963
set time:  50.71096289996058
dict time: 49.972679699771106
set time:  50.71838010009378
dict time: 49.957800599746406
set time:  50.747396499849856
dict time: 49.97235369961709
set time:  50.69941039942205
dict time: 49.951399500481784
set time:  50.647985899820924
dict time: 49.94027389958501
set time:  50.66828709933907
dict time: 49.94174600020051
set time:  50.65279300045222
dict time: 49.96716000046581
set time:  50.64943030010909
dict time: 49.95117200072855
set time:  50.65525580011308
dict time: 49.962328700348735
set time:  50.66319840028882
dict time: 49.960031100548804
set time:  50.672181099653244
dict time: 49.93908840045333
set time:  50.651302699930966
dict time: 49.94130470044911
set time:  50.655242399312556
dict time: 50.04310019966215
set time:  50.67391949985176
dict time: 49.93010629992932
set time:  50.64970660023391
dict time: 49.991717299446464
set time:  50.65591560024768
dict time: 49.952454400248826
set time:  50.649492600001395
dict time: 49.92677689995617
set time:  50.635977199301124
dict time: 49.95432769972831
set time:  50.64075019955635
dict time: 49.94808299932629
set time:  50.664196100085974
dict time: 49.966013699769974
set time:  50.649582100100815
dict time: 49.9813024001196
set time:  50.64982909988612
dict time: 49.93897459935397
set time:  50.66509110014886
dict time: 49.95878900028765
set time:  50.649003400467336
dict time: 49.96674569975585
set time:  50.69693780038506
dict time: 49.91303739976138
set time:  50.675189800560474
dict time: 49.950330699793994
set time:  50.64532170072198
dict time: 49.95022019930184
set time:  50.65448010060936
dict time: 49.95197269972414
set time:  50.65391890052706
dict time: 49.94361769966781
set time:  50.67086180020124
dict time: 49.95455109979957
set time:  50.670443600043654
dict time: 49.94633509963751
set time:  50.65955980028957
dict time: 49.967472000047565
set time:  50.66301089990884
dict time: 49.95830660033971
set time:  50.67482869978994
dict time: 49.984512499533594
set time:  50.67321899998933
dict time: 50.01141999941319
set time:  50.84260869957507
dict time: 50.31206789985299
set time:  51.02959220018238
dict time: 50.28449110034853
set time:  51.03110689949244
dict time: 50.303432799875736
set time:  51.02032170072198
dict time: 50.281682999804616
set time:  51.05188430007547
dict time: 50.30898350011557
set time:  51.01742030028254
dict time: 50.3027657000348
set time:  51.02114639990032
dict time: 50.00038649979979
set time:  50.65360379964113
dict time: 49.93306410033256
set time:  50.63413709960878
dict time: 49.95266539976001
set time:  50.65499630011618
dict time: 49.94854210037738
set time:  50.703547400422394
dict time: 49.96691229939461
set time:  50.69470370002091
dict time: 49.95223430078477
set time:  50.70982529968023
dict time: 49.954243999905884
set time:  50.791720499284565
dict time: 49.97948960028589
set time:  50.69436000008136
dict time: 49.98102519940585
set time:  50.73820179980248
dict time: 49.96782180014998
set time:  50.722959300503135
dict time: 49.9863857999444
set time:  50.70789400022477
dict time: 49.9592831004411
set time:  50.707397900521755
dict time: 49.94034240022302
set time:  50.667025099508464
dict time: 49.96215169969946
set time:  50.72984409984201
dict time: 49.98776920046657
set time:  50.72097889985889
Sets were faster in  2 trials
Dicts were faster in 98 trials

1 个答案:

答案 0 :(得分:1)

当我测试你的代码时,我认为数字可能有点小。所以我将它们增加了10倍,让random.sample在100个数字中有1个数字。

import random
from time import time


def timeit(func):
    def wrap(*args):
        start = time()
        result = func(*args)
        return time()-start
    return wrap


def get_set_and_dict():
    universe = range(1, 10**8)
    keys = random.sample(universe, 10**6)
    lookups = random.sample(universe,10**6)
    dict_set = dict((k,True) for k in keys)
    set_set = set(keys)
    return dict_set, set_set, lookups


@timeit
def test(container, lookups):

    for i in lookups:
        a = i in container


def main():
    dict_set, set_set, lookups = get_set_and_dict()
    acc_set = acc_dict = 0
    rounds = 100
    for _ in range(rounds):
        acc_dict += test(dict_set, lookups)
        acc_set += test(set_set, lookups)
    print("Set time: {:.4f}s\n Dict time: {:.4f}s".format(acc_set/rounds, acc_dict/rounds))

if __name__ == '__main__':
    main()

>> Set time: 0.1263s
>> Dict time: 0.1578s

但是如果set和dict不同,那将是有意义的,因为它们不是,即使是相同的,也是相同的。

也许只是取决于您如何设置实验,结论会有所不同。