使用下面给出的示例代码,我想更好地理解python速度如何根据我如何构造给定函数而变化。
我定义的示例函数的工作方式如下:给定两个字符串,它们返回它们不同的位数。我们假设assert len(s1) == len(s2)
始终为真。
第一个函数使用列表推导。
def h_dist1(s1,s2):
return sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))
第二个函数使用经典for循环。
def h_dist2(s1,s2):
tot = 0
for d1, d2 in zip(s1, s2):
if d1 != d2:
tot += 1
return tot
第二个代码的复杂性显然是O(N)
len(s1)=len(s2)=N
。
示例相关问题:有没有更好的方法来定义此特定功能? h_dist1
的复杂性是什么?
一般问题:一般来说,最好的(在可读性,速度,效率,更多pythonic方面)定义一个类似于上面例子中给出的函数的方法(即需要循环遍历字符串/数组/等)?并且,最重要的是,为什么特定方式是最快/最有效的?
注意我查找了类似的问题,但我没有发现任何具体问题,例如:在here中,HYRY说要加速代码,应该在for循环中使用1.局部变量,并使用list comprehension。 但我仍然不明白为什么。当然,欢迎提及其他Q / A.
答案 0 :(得分:2)
尝试尽可能地删除Python循环,不要在内存中创建不必要的列表,遵循这些可以获得非常有效的解决方案。例如zip
在内存中创建一个列表,因此我们可以使用itertools.izip
来获取迭代器。因此,根据我的快速测试,sum(starmap(ne, izip(s1, s2)))
是最快的一个:
>>> from itertools import imap, izip, starmap
>>> from operator import ne
>>> s1 = 'a'*10**5
>>> s2 = 'b'*10**5
>>> %timeit sum(starmap(ne, izip(s1, s2)))
100 loops, best of 3: 4.25 ms per loop
很少有其他解决方案:
>>> %timeit sum(imap(ne, s1, s2))
100 loops, best of 3: 5.08 ms per loop
>>> %timeit sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))
100 loops, best of 3: 11.3 ms per loop
>>> %timeit sum(1 for dgt1, dgt2 in zip(s1, s2) if dgt1 != dgt2)
100 loops, best of 3: 10.7 ms per loop
>>> %timeit sum(dgt1 != dgt2 for dgt1, dgt2 in izip(s1, s2))
100 loops, best of 3: 7.02 ms per loop
>>> %timeit sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)
100 loops, best of 3: 6.17 ms per loop
但差异并不大,所以我个人会将izip
与生成器表达式一起使用,而不会在Python中滥用True == 1和False == 0这一事实:
sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)
答案 1 :(得分:1)
不要太快写下简陋的for
循环。如果您实际上不需要列表,例如在这种情况下,标准for
循环可能比使用列表理解更快。当然它的内存开销更少。
这是一个执行计时测试的程序;它可以很容易地修改,以添加更多的测试。
#!/usr/bin/env python
''' Time various implementations of string diff function
From http://stackoverflow.com/q/28581218/4014959
Written by PM 2Ring 2015.02.18
'''
from itertools import imap, izip, starmap
from operator import ne
from timeit import Timer
from random import random, seed
def h_dist0(s1,s2):
''' For loop '''
tot = 0
for d1, d2 in zip(s1, s2):
if d1 != d2:
tot += 1
return tot
def h_dist1(s1,s2):
''' List comprehension '''
return sum([dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2)])
def h_dist2(s1,s2):
''' Generator expression '''
return sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))
def h_dist3(s1,s2):
''' Generator expression with if '''
return sum(1 for dgt1, dgt2 in zip(s1, s2) if dgt1 != dgt2)
def h_dist3a(s1,s2):
''' Generator expression with izip '''
return sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)
def h_dist4(s1,s2):
''' imap '''
return sum(imap(ne, s1, s2))
def h_dist5(s1,s2):
''' starmap '''
return sum(starmap(ne, izip(s1, s2)))
funcs = [
h_dist0,
h_dist1,
h_dist2,
h_dist3,
h_dist3a,
h_dist4,
h_dist5,
]
# ------------------------------------
def check_full():
print 'Testing all functions with strings of length', len(s1)
for func in funcs:
print '%s:%s\n%d\n' % (func.func_name, func.__doc__, func(s1, s2))
def check():
print 'Testing all functions with strings of length', len(s1)
print [func(s1, s2) for func in funcs], '\n'
def time_test(loops=10000, reps=3):
''' Print timing stats for all the functions '''
slen = len(s1)
print 'Length = %d, Loops = %d, Repetitions = %d' % (slen, loops, reps)
for func in funcs:
#Get function name and docstring
fname = func.func_name
fdoc = func.__doc__
print '\n%s:%s' % (fname, fdoc)
t = Timer('%s(s1, s2)' % fname, 'from __main__ import s1, s2, %s' % fname)
results = t.repeat(reps, loops)
results.sort()
print results
print '\n' + '- '*30 + '\n'
def make_strings(n, r=0.5):
print 'r:', r
s1 = 'a' * n
s2 = ''.join(['b' if random() < r else 'a' for _ in xrange(n)])
return s1, s2
# ------------------------------------
seed(37)
s1, s2 = make_strings(100)
#print '%s\n%s\n' % (s1, s2)
check()
time_test(10000)
s1, s2 = make_strings(100, 0.1)
check()
time_test(10000)
s1, s2 = make_strings(100, 0.9)
check()
time_test(10000)
s1, s2 = make_strings(10)
check()
time_test(50000)
s1, s2 = make_strings(1000)
check()
time_test(1000)
以下结果来自Linux上运行Python 2.6.6的32位2GHz Pentium 4。
<强>输出强>
r: 0.5
Testing all functions with strings of length 100
[45, 45, 45, 45, 45, 45, 45]
Length = 100, Loops = 10000, Repetitions = 3
h_dist0: For loop
[0.62271595001220703, 0.63597297668457031, 0.65991997718811035]
h_dist1: List comprehension
[0.80136799812316895, 1.0849411487579346, 1.1687240600585938]
h_dist2: Generator expression
[0.81829214096069336, 0.82315492630004883, 0.85774612426757812]
h_dist3: Generator expression with if
[0.67409086227416992, 0.67418098449707031, 0.68189001083374023]
h_dist3a: Generator expression with izip
[0.54596519470214844, 0.54696321487426758, 0.54910516738891602]
h_dist4: imap
[0.4574120044708252, 0.45927596092224121, 0.46362900733947754]
h_dist5: starmap
[0.38610100746154785, 0.38653087615966797, 0.39858913421630859]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
r: 0.1
Testing all functions with strings of length 100
[13, 13, 13, 13, 13, 13, 13]
Length = 100, Loops = 10000, Repetitions = 3
h_dist0: For loop
[0.59487199783325195, 0.61918497085571289, 0.62035894393920898]
h_dist1: List comprehension
[0.77733206748962402, 0.77883815765380859, 0.78676295280456543]
h_dist2: Generator expression
[0.8313758373260498, 0.83669614791870117, 0.8419950008392334]
h_dist3: Generator expression with if
[0.60900688171386719, 0.61443901062011719, 0.6202390193939209]
h_dist3a: Generator expression with izip
[0.48425912857055664, 0.48703289031982422, 0.49215483665466309]
h_dist4: imap
[0.45452284812927246, 0.46001195907592773, 0.4652099609375]
h_dist5: starmap
[0.37329483032226562, 0.37666082382202148, 0.40111804008483887]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
r: 0.9
Testing all functions with strings of length 100
[94, 94, 94, 94, 94, 94, 94]
Length = 100, Loops = 10000, Repetitions = 3
h_dist0: For loop
[0.69256496429443359, 0.69339799880981445, 0.70190787315368652]
h_dist1: List comprehension
[0.80547499656677246, 0.81107187271118164, 0.81337189674377441]
h_dist2: Generator expression
[0.82524299621582031, 0.82638883590698242, 0.82899308204650879]
h_dist3: Generator expression with if
[0.80344915390014648, 0.8050081729888916, 0.80581092834472656]
h_dist3a: Generator expression with izip
[0.63276004791259766, 0.63585305213928223, 0.64699077606201172]
h_dist4: imap
[0.46122288703918457, 0.46677708625793457, 0.46921491622924805]
h_dist5: starmap
[0.38288688659667969, 0.38731098175048828, 0.38867902755737305]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
r: 0.5
Testing all functions with strings of length 10
[5, 5, 5, 5, 5, 5, 5]
Length = 10, Loops = 50000, Repetitions = 3
h_dist0: For loop
[0.55377697944641113, 0.55385804176330566, 0.56589198112487793]
h_dist1: List comprehension
[0.69614696502685547, 0.71386599540710449, 0.71778011322021484]
h_dist2: Generator expression
[0.74240994453430176, 0.77340388298034668, 0.77429509162902832]
h_dist3: Generator expression with if
[0.66713404655456543, 0.66874384880065918, 0.67353487014770508]
h_dist3a: Generator expression with izip
[0.59427285194396973, 0.59525203704833984, 0.60147690773010254]
h_dist4: imap
[0.46971893310546875, 0.4749150276184082, 0.4831998348236084]
h_dist5: starmap
[0.46615099906921387, 0.47054886817932129, 0.47225403785705566]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
r: 0.5
Testing all functions with strings of length 1000
[506, 506, 506, 506, 506, 506, 506]
Length = 1000, Loops = 1000, Repetitions = 3
h_dist0: For loop
[0.59869503974914551, 0.60042905807495117, 0.60753512382507324]
h_dist1: List comprehension
[0.68359518051147461, 0.70072579383850098, 0.7146599292755127]
h_dist2: Generator expression
[0.7492527961730957, 0.75325894355773926, 0.75805497169494629]
h_dist3: Generator expression with if
[0.59286904335021973, 0.59505105018615723, 0.59793591499328613]
h_dist3a: Generator expression with izip
[0.49536395072937012, 0.49821090698242188, 0.54327893257141113]
h_dist4: imap
[0.42384982109069824, 0.43060398101806641, 0.43535709381103516]
h_dist5: starmap
[0.34122705459594727, 0.35040402412414551, 0.35851287841796875]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -