编辑2:有人建议这是一个类似问题的副本。我不同意,因为我的问题集中在速度,而另一个问题是什么更“可读”或“更好”(没有更好地定义)。虽然问题相似,但所给出的讨论/答案存在很大差异。
编辑: 我从问题中意识到我可以更清楚。 对不起代码拼写错误,是的,它应该使用适当的python运算符进行添加。
关于输入数据,我只选择了一个随机数列表,因为这是一个常见的样本。在我的情况下,我正在使用一个dict,我期望很多keyerrors,可能95%的密钥将不存在,并且存在的少数将包含数据集群。
我对一般性讨论感兴趣,无论输入数据集如何,但当然样本的运行时间都很有趣。
我的标准方法就像许多其他帖子一样写
list = (100 random numbers)
d = {}
for x in list:
if x in d:
d[x]+=1
else:
d[x]=1
但我刚想到这个更快,因为我们不必检查字典是否包含密钥。我们只是假设它确实如此,如果没有,我们会处理它。 有什么区别或者Python比我更聪明吗?
list = (100 random numbers)
d = {}
for x in list:
try:
d[x]+=1
except KeyError:
d[x] = 1
与数组中的索引,越界,负索引等相同的方法。
答案 0 :(得分:20)
您的声明绝对错误取决于输入。
如果您拥有多种密钥,并经常点击except
块,则性能不佳。如果try
块占优势,try/except
成语可以在较小的列表上执行。
这是一个基准,展示了几种做同样事情的方法:
from __future__ import print_function
import timeit
import random
import collections
def f1():
d={}
for x in tgt:
if x in d:
d[x]+=1
else:
d[x]=1
return d
def f2():
d = {}
for x in tgt:
try:
d[x]+=1
except KeyError:
d[x] = 1
return d
def f3():
d={}.fromkeys(tgt, 0)
for x in tgt:
d[x]+=1
return d
def f4():
d=collections.defaultdict(int)
for x in tgt:
d[x]+=1
return d
def f5():
return collections.Counter(tgt)
def f6():
d={}
for x in tgt:
d[x]=d.setdefault(x, 0)+1
return d
def f7():
d={}
for x in tgt:
d[x]=d.get(x,0)+1
return d
def cmpthese(funcs, c=10000, rate=True, micro=False):
"""Generate a Perl style function benchmark"""
def pprint_table(table):
"""Perl style table output"""
def format_field(field, fmt='{:,.0f}'):
if type(field) is str: return field
if type(field) is tuple: return field[1].format(field[0])
return fmt.format(field)
def get_max_col_w(table, index):
return max([len(format_field(row[index])) for row in table])
col_paddings=[get_max_col_w(table, i) for i in range(len(table[0]))]
for i,row in enumerate(table):
# left col
row_tab=[row[0].ljust(col_paddings[0])]
# rest of the cols
row_tab+=[format_field(row[j]).rjust(col_paddings[j]) for j in range(1,len(row))]
print(' '.join(row_tab))
results={k.__name__:timeit.Timer(k).timeit(c) for k in funcs}
fastest=sorted(results,key=results.get, reverse=True)
table=[['']]
if rate: table[0].append('rate/sec')
if micro: table[0].append('usec/pass')
table[0].extend(fastest)
for e in fastest:
tmp=[e]
if rate:
tmp.append('{:,}'.format(int(round(float(c)/results[e]))))
if micro:
tmp.append('{:.3f}'.format(1000000*results[e]/float(c)))
for x in fastest:
if x==e: tmp.append('--')
else: tmp.append('{:.1%}'.format((results[x]-results[e])/results[e]))
table.append(tmp)
pprint_table(table)
if __name__=='__main__':
import sys
print(sys.version)
for j in [100,1000]:
for t in [(0,5), (0,50), (0,500)]:
tgt=[random.randint(*t) for i in range(j)]
print('{} rand ints between {}:'.format(j,t))
print('=====')
cmpthese([f1,f2,f3,f4,f5,f6,f7])
print()
我已经包含了一个基于timeit
的小型基准测试功能,它将函数从Slowest打印到Fastest,它们之间的百分比差异。
以下是Python 3的结果:
3.4.1 (default, May 19 2014, 13:10:29)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
100 rand ints between (0, 5):
=====
rate/sec f6 f7 f1 f2 f3 f4 f5
f6 52,756 -- -1.6% -26.2% -27.9% -30.7% -36.7% -46.8%
f7 53,624 1.6% -- -25.0% -26.7% -29.6% -35.7% -46.0%
f1 71,491 35.5% 33.3% -- -2.3% -6.1% -14.2% -28.0%
f2 73,164 38.7% 36.4% 2.3% -- -3.9% -12.2% -26.3%
f3 76,148 44.3% 42.0% 6.5% 4.1% -- -8.7% -23.3%
f4 83,368 58.0% 55.5% 16.6% 13.9% 9.5% -- -16.0%
f5 99,247 88.1% 85.1% 38.8% 35.6% 30.3% 19.0% --
100 rand ints between (0, 50):
=====
rate/sec f2 f6 f7 f4 f3 f1 f5
f2 39,405 -- -17.9% -18.7% -19.1% -41.8% -47.8% -56.3%
f6 47,980 21.8% -- -1.1% -1.6% -29.1% -36.5% -46.8%
f7 48,491 23.1% 1.1% -- -0.5% -28.4% -35.8% -46.2%
f4 48,737 23.7% 1.6% 0.5% -- -28.0% -35.5% -46.0%
f3 67,678 71.7% 41.1% 39.6% 38.9% -- -10.4% -24.9%
f1 75,511 91.6% 57.4% 55.7% 54.9% 11.6% -- -16.3%
f5 90,175 128.8% 87.9% 86.0% 85.0% 33.2% 19.4% --
100 rand ints between (0, 500):
=====
rate/sec f2 f4 f6 f7 f3 f1 f5
f2 25,748 -- -22.0% -41.4% -42.6% -57.5% -66.2% -67.8%
f4 32,996 28.1% -- -24.9% -26.4% -45.6% -56.7% -58.8%
f6 43,930 70.6% 33.1% -- -2.0% -27.5% -42.4% -45.1%
f7 44,823 74.1% 35.8% 2.0% -- -26.1% -41.2% -44.0%
f3 60,624 135.5% 83.7% 38.0% 35.3% -- -20.5% -24.2%
f1 76,244 196.1% 131.1% 73.6% 70.1% 25.8% -- -4.7%
f5 80,026 210.8% 142.5% 82.2% 78.5% 32.0% 5.0% --
1000 rand ints between (0, 5):
=====
rate/sec f7 f6 f1 f3 f2 f4 f5
f7 4,993 -- -6.7% -34.6% -39.4% -44.4% -50.1% -71.1%
f6 5,353 7.2% -- -29.9% -35.0% -40.4% -46.5% -69.0%
f1 7,640 53.0% 42.7% -- -7.3% -14.9% -23.6% -55.8%
f3 8,242 65.1% 54.0% 7.9% -- -8.2% -17.6% -52.3%
f2 8,982 79.9% 67.8% 17.6% 9.0% -- -10.2% -48.1%
f4 10,004 100.4% 86.9% 30.9% 21.4% 11.4% -- -42.1%
f5 17,293 246.4% 223.0% 126.3% 109.8% 92.5% 72.9% --
1000 rand ints between (0, 50):
=====
rate/sec f7 f6 f1 f2 f3 f4 f5
f7 5,051 -- -7.1% -26.5% -29.0% -34.1% -45.7% -71.2%
f6 5,435 7.6% -- -20.9% -23.6% -29.1% -41.5% -69.0%
f1 6,873 36.1% 26.5% -- -3.4% -10.3% -26.1% -60.8%
f2 7,118 40.9% 31.0% 3.6% -- -7.1% -23.4% -59.4%
f3 7,661 51.7% 41.0% 11.5% 7.6% -- -17.6% -56.3%
f4 9,297 84.0% 71.1% 35.3% 30.6% 21.3% -- -47.0%
f5 17,531 247.1% 222.6% 155.1% 146.3% 128.8% 88.6% --
1000 rand ints between (0, 500):
=====
rate/sec f2 f4 f6 f7 f3 f1 f5
f2 3,985 -- -11.0% -13.6% -14.8% -25.7% -40.4% -66.9%
f4 4,479 12.4% -- -2.9% -4.3% -16.5% -33.0% -62.8%
f6 4,613 15.8% 3.0% -- -1.4% -14.0% -31.0% -61.6%
f7 4,680 17.4% 4.5% 1.4% -- -12.7% -30.0% -61.1%
f3 5,361 34.5% 19.7% 16.2% 14.6% -- -19.8% -55.4%
f1 6,683 67.7% 49.2% 44.9% 42.8% 24.6% -- -44.4%
f5 12,028 201.8% 168.6% 160.7% 157.0% 124.3% 80.0% --
Python 2:
2.7.6 (default, Dec 1 2013, 13:26:15)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)]
100 rand ints between (0, 5):
=====
rate/sec f5 f7 f6 f2 f1 f3 f4
f5 24,955 -- -41.8% -42.5% -51.3% -55.7% -61.6% -65.2%
f7 42,867 71.8% -- -1.2% -16.4% -23.9% -34.0% -40.2%
f6 43,382 73.8% 1.2% -- -15.4% -23.0% -33.2% -39.5%
f2 51,293 105.5% 19.7% 18.2% -- -9.0% -21.0% -28.5%
f1 56,357 125.8% 31.5% 29.9% 9.9% -- -13.2% -21.4%
f3 64,924 160.2% 51.5% 49.7% 26.6% 15.2% -- -9.5%
f4 71,709 187.3% 67.3% 65.3% 39.8% 27.2% 10.5% --
100 rand ints between (0, 50):
=====
rate/sec f2 f5 f7 f6 f4 f3 f1
f2 22,439 -- -4.7% -45.1% -45.5% -50.7% -63.3% -64.5%
f5 23,553 5.0% -- -42.4% -42.8% -48.3% -61.5% -62.8%
f7 40,878 82.2% 73.6% -- -0.7% -10.2% -33.2% -35.4%
f6 41,164 83.4% 74.8% 0.7% -- -9.6% -32.7% -34.9%
f4 45,525 102.9% 93.3% 11.4% 10.6% -- -25.6% -28.0%
f3 61,167 172.6% 159.7% 49.6% 48.6% 34.4% -- -3.3%
f1 63,261 181.9% 168.6% 54.8% 53.7% 39.0% 3.4% --
100 rand ints between (0, 500):
=====
rate/sec f2 f5 f4 f6 f7 f3 f1
f2 13,122 -- -39.9% -56.2% -63.2% -63.8% -75.8% -80.0%
f5 21,837 66.4% -- -27.1% -38.7% -39.8% -59.6% -66.7%
f4 29,945 128.2% 37.1% -- -16.0% -17.4% -44.7% -54.3%
f6 35,633 171.6% 63.2% 19.0% -- -1.7% -34.2% -45.7%
f7 36,257 176.3% 66.0% 21.1% 1.8% -- -33.0% -44.7%
f3 54,113 312.4% 147.8% 80.7% 51.9% 49.2% -- -17.5%
f1 65,570 399.7% 200.3% 119.0% 84.0% 80.8% 21.2% --
1000 rand ints between (0, 5):
=====
rate/sec f5 f7 f6 f1 f2 f3 f4
f5 2,787 -- -37.7% -38.4% -53.3% -59.9% -60.4% -67.0%
f7 4,477 60.6% -- -1.1% -25.0% -35.6% -36.3% -47.0%
f6 4,524 62.3% 1.1% -- -24.2% -34.9% -35.6% -46.5%
f1 5,972 114.3% 33.4% 32.0% -- -14.1% -15.0% -29.3%
f2 6,953 149.5% 55.3% 53.7% 16.4% -- -1.1% -17.7%
f3 7,030 152.2% 57.0% 55.4% 17.7% 1.1% -- -16.8%
f4 8,452 203.3% 88.8% 86.8% 41.5% 21.6% 20.2% --
1000 rand ints between (0, 50):
=====
rate/sec f5 f7 f6 f2 f1 f3 f4
f5 2,667 -- -37.8% -38.7% -53.0% -55.9% -61.1% -65.3%
f7 4,286 60.7% -- -1.5% -24.5% -29.1% -37.5% -44.2%
f6 4,351 63.1% 1.5% -- -23.4% -28.0% -36.6% -43.4%
f2 5,677 112.8% 32.4% 30.5% -- -6.1% -17.3% -26.1%
f1 6,045 126.6% 41.0% 39.0% 6.5% -- -11.9% -21.4%
f3 6,862 157.3% 60.1% 57.7% 20.9% 13.5% -- -10.7%
f4 7,687 188.2% 79.3% 76.7% 35.4% 27.2% 12.0% --
1000 rand ints between (0, 500):
=====
rate/sec f2 f5 f7 f6 f4 f3 f1
f2 2,018 -- -16.1% -44.1% -46.2% -53.4% -61.8% -63.0%
f5 2,405 19.1% -- -33.4% -35.9% -44.5% -54.4% -55.9%
f7 3,609 78.8% 50.1% -- -3.8% -16.7% -31.6% -33.8%
f6 3,753 85.9% 56.1% 4.0% -- -13.4% -28.9% -31.2%
f4 4,334 114.7% 80.2% 20.1% 15.5% -- -17.9% -20.5%
f3 5,277 161.5% 119.5% 46.2% 40.6% 21.8% -- -3.2%
f1 5,454 170.2% 126.8% 51.1% 45.3% 25.8% 3.3% --
所以 - 这取决于。
结论:
Counter
方法几乎总是最慢的Counter
方法是Python 2中最慢的方法,但到目前为止是Python 3.4上最快的方法try/except
版本通常是最慢的if key in dict
版本都是最佳/最快的版本之一{}.fromkeys(tgt, 0)
非常可预测defaultdict
版本在较大的列表中最快。较小的列表表示较长的设置时间是通过太少的元素分摊的。 答案 1 :(得分:1)
编码风格还有另外一点。使用 EAFP (更容易请求宽恕而不是权限)的常见python编码风格假设存在有效密钥并且如果假设证明存在异常则捕获异常假的。
由于这种常见的编码风格,我总是使用try / except方法,并确保这比 LBYL 风格更快(在你跳跃之前看看) 。正如我在这里得到的答案所知,这绝对取决于。只要您能够获得现有密钥,我就会尝试使用try / except方法。
答案 2 :(得分:0)
更新: 不确定我是否正在测试正确的东西,但仍然发现结果很有趣。
Python 2:
0% missing keys, Standard access: 0.419198036194
0% missing keys, try/except access: 0.309811115265
50% missing keys, Standard access: 0.417014837265
50% missing keys, try/except access: 0.309100866318
100% missing keys, Standard access: 0.416236877441
100% missing keys, try/except access: 0.310797929764
我使用普通和try / except方法测试了3个具有不同数量键的字典。对我来说,try / except方法每次都更快。
我的代码:
from timeit import timeit
size = 2**10
allkeys = "0% missing keys", dict([(i, 0) for i in range(size)])
somekeys= "50% missing keys", dict([(i*2, 0) for i in range(size//2)])
nokeys = "100% missing keys", dict([])
def test_normal():
"""Standard access"""
for i in xrange(size):
if i in d:
d[i] += 1
else:
d[i] = 1
def test_try():
"""try/except access"""
for i in xrange(size):
try:
d[i] += 1
except KeyError:
d[i] = 1
for trial in (allkeys, somekeys, nokeys):
d = trial[1]
for test in (test_normal, test_try):
trial_time = timeit("test()",
setup="from __main__ import test",
number=2**10)
print "{0}, {1}: {2}".format(trial[0], test.__doc__, trial_time)
代码现在使用timeit,这可能更准确。
答案 3 :(得分:-1)
注意:完全是推测性的
我认为第一个会慢一些,因为它在字典中定位键两次,首先在if语句中,然后在C代码中进行字典访问。当许多键不在字典中时,try-except可能会更慢,因为处理异常需要一些开销。
我将列表设置为range(100)
并将字典留空。第一次使用if
需要8.003秒,第二次使用try-except需要30.976秒!在这样的情况下,开销是非常重要的,其中没有其他任何事情要做。
答案 4 :(得分:-2)
import random
from pip._vendor.distlib.compat import raw_input
x=random.randint(1,99)
guess = int(raw_input("Enter a integer from 1 to 99:"))
while x !="guess":
print
if guess<x:
print ("guess is low")
guess= int(raw_input("Enter a integer from 1 to 99:"))
elif guess >x:
print ("guess is high")
guess = int(raw_input("Enter a integer from 1 to 99:"))
else:
print (" you guessed it !")
break
print