Python字典查找性能,得到vs

时间:2016-09-19 21:07:16

标签: python performance dictionary

这不是过早的优化。我的用例在内部循环的内部循环中对dict的权限进行了双重检查,并且一直在运行。此外,它在智力上令人厌烦(见结果)。

哪种方法更快?

willTransitionToTraitCollection:

编辑:这些是相同的速度。常识告诉我(B)应该明显更快,因为它只有一个dict查找与2,但结果是不同的。我在挠头。

平均超过12次运行的基准测试结果,其中1/2是命中,另一半是未命中:

mydict = { 'hello': 'yes', 'goodbye': 'no' }
key = 'hello'

# (A)
if key in mydict:
    a = mydict[key]
    do_things(a)
else:
    handle_an_error()

# vs (B)
a = mydict.get(key,None)
if a is not None:
    do_things(a)
else:
    handle_an_error()

当一个类似的运行(* 10多个循环)而没有找到密钥时,

doing in
switching to get
total time for IN:  0.532250006994
total time for GET:  0.480916659037
times found: 12000000
times not found: 12000000

为什么!?

(正确)代码

doing in
switching to get
total time for IN:  2.35899998744
total time for GET:  4.13858334223

(原始)代码     进口时间     smalldict = {}     我在范围内(10):         smalldict [str(i * 4)] = str(i * 18)

import time
smalldict = {}
for i in range(10):
    smalldict[str(i*4)] = str(i*18)

smalldict["8"] = "hello"

bigdict = {}
for i in range(10000):
    bigdict[str(i*100)] = str(i*4123)
bigdict["hello"] = "yes!"

timetotal = 0
totalin = 0
totalget = 0
key = "hello"
found= 0
notfound = 0

ddo = bigdict # change to smalldict for small dict gets
print 'doing in'


for r in range(12):
    start = time.time()
    a = r % 2
    for i in range(1000000):
        if a == 0:
            if str(key) in ddo:
                found = found + 1
                foo = ddo[str(key)]
            else:
                notfound = notfound + 1
                foo = "nooo"
        else:
            if 'yo' in ddo:
                found = found + 1
                foo = ddo['yo']
            else:
                notfound = notfound + 1
                foo = "nooo"
    timetotal = timetotal + (time.time() - start)

totalin = timetotal / 12.0 

print 'switching to get'
timetotal = 0
for r in range(12):
    start = time.time()
    a = r % 2
    for i in range(1000000):
        if a == 0:
            foo = ddo.get(key,None)
            if foo is not None:
                found = found + 1
            else:
                notfound = notfound + 1
                foo = "nooo"
        else:
            foo = ddo.get('yo',None)
            if foo is not None:
                found = found + 1
                notfound = notfound + 1
            else:
                notfound = notfound + 1
                foo = "oooo"
    timetotal = timetotal + (time.time() - start)

totalget = timetotal / 12

print "total time for IN: ", totalin
print 'total time for GET: ', totalget
print 'times found:', found
print 'times not found:', notfound

1 个答案:

答案 0 :(得分:2)

我们可以做一些更好的时间:

import timeit

d = dict.fromkeys(range(10000))

def d_get_has(d):
    return d.get(1)

def d_get_not_has(d):
    return d.get(-1)

def d_in_has(d):
    if 1 in d:
        return d[1]

def d_in_not_has(d):
    if -1 in d:
        return d[-1]


print timeit.timeit('d_get_has(d)', 'from __main__ import d, d_get_has')
print timeit.timeit('d_get_not_has(d)', 'from __main__ import d, d_get_not_has')
print timeit.timeit('d_in_has(d)', 'from __main__ import d, d_in_has')
print timeit.timeit('d_in_not_has(d)', 'from __main__ import d, d_in_not_has')

在我的电脑上," in"变体比.get变体更快。这可能是因为.get是dict上的属性查找,并且属性查找可能与dict上的成员资格测试一样昂贵。请注意,in和使用dict[x]的项目查找可以直接在字节码中完成,因此可以绕过正常的方法查找...

如果我只是使用pypy,我也可能值得指出我获得了巨大的优化: - ):

$ python ~/sandbox/test.py
0.169840812683
0.1732609272
0.122044086456
0.0991759300232

$ pypy ~/sandbox/test.py
0.00974893569946
0.00752687454224
0.00812077522278
0.00686597824097