循环与列表理解与其他方法的速度/效率比较

时间:2015-02-18 10:25:49

标签: python performance for-loop list-comprehension

使用下面给出的示例代码,我想更好地理解python速度如何根据我如何构造给定函数而变化。

我定义的示例函数的工作方式如下:给定两个字符串,它们返回它们不同的位数。我们假设assert len(s1) == len(s2)始终为真。

第一个函数使用列表推导。

def h_dist1(s1,s2):
    return sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))

第二个函数使用经典for循环。

def h_dist2(s1,s2):
    tot = 0
    for d1, d2 in zip(s1, s2):
        if d1 != d2:
            tot += 1
    return tot

第二个代码的复杂性显然是O(N) len(s1)=len(s2)=N

示例相关问题:有没有更好的方法来定义此特定功能? h_dist1的复杂性是什么?

一般问题:一般来说,最好的(在可读性,速度,效率,更多pythonic方面)定义一个类似于上面例子中给出的函数的方法(即需要循环遍历字符串/数组/等)?并且,最重要的是,为什么特定方式是最快/最有效的

注意我查找了类似的问题,但我没有发现任何具体问题,例如:在here中,HYRY说要加速代码,应该在for循环中使用1.局部变量,并使用list comprehension。 但我仍然不明白为什么。当然,欢迎提及其他Q / A.

2 个答案:

答案 0 :(得分:2)

尝试尽可能地删除Python循环,不要在内存中创建不必要的列表,遵循这些可以获得非常有效的解决方案。例如zip在内存中创建一个列表,因此我们可以使用itertools.izip来获取迭代器。因此,根据我的快速测试,sum(starmap(ne, izip(s1, s2)))是最快的一个:

>>> from itertools import imap, izip, starmap
>>> from operator import ne
>>> s1 = 'a'*10**5
>>> s2 = 'b'*10**5
>>> %timeit sum(starmap(ne, izip(s1, s2)))
100 loops, best of 3: 4.25 ms per loop

很少有其他解决方案:

>>> %timeit sum(imap(ne, s1, s2))
100 loops, best of 3: 5.08 ms per loop
>>> %timeit sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))
100 loops, best of 3: 11.3 ms per loop
>>> %timeit sum(1 for dgt1, dgt2 in zip(s1, s2) if dgt1 != dgt2)
100 loops, best of 3: 10.7 ms per loop
>>> %timeit sum(dgt1 != dgt2 for dgt1, dgt2 in izip(s1, s2))
100 loops, best of 3: 7.02 ms per loop
>>> %timeit sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)
100 loops, best of 3: 6.17 ms per loop

但差异并不大,所以我个人会将izip与生成器表达式一起使用,而不会在Python中滥用True == 1和False == 0这一事实:

sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)

答案 1 :(得分:1)

不要太快写下简陋的for循环。如果您实际上不需要列表,例如在这种情况下,标准for循环可能比使用列表理解更快。当然它的内存开销更少。

这是一个执行计时测试的程序;它可以很容易地修改,以添加更多的测试。

#!/usr/bin/env python

''' Time various implementations of string diff function

    From http://stackoverflow.com/q/28581218/4014959

    Written by PM 2Ring 2015.02.18
'''

from itertools import imap, izip, starmap
from operator import ne

from timeit import Timer
from random import random, seed

def h_dist0(s1,s2):
    ''' For loop '''
    tot = 0
    for d1, d2 in zip(s1, s2):
        if d1 != d2:
            tot += 1
    return tot

def h_dist1(s1,s2):
    ''' List comprehension '''
    return sum([dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2)])

def h_dist2(s1,s2):
    ''' Generator expression '''
    return sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))

def h_dist3(s1,s2):
    ''' Generator expression with if '''
    return sum(1 for dgt1, dgt2 in zip(s1, s2) if dgt1 != dgt2)

def h_dist3a(s1,s2):
    ''' Generator expression with izip '''
    return sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)

def h_dist4(s1,s2):
    ''' imap '''
    return sum(imap(ne, s1, s2))

def h_dist5(s1,s2):
    ''' starmap '''
    return sum(starmap(ne, izip(s1, s2)))

funcs = [
    h_dist0,
    h_dist1,
    h_dist2,
    h_dist3,
    h_dist3a,
    h_dist4,
    h_dist5,
]

# ------------------------------------

def check_full():
    print 'Testing all functions with strings of length', len(s1)
    for func in funcs:
        print '%s:%s\n%d\n' % (func.func_name, func.__doc__, func(s1, s2))

def check():
    print 'Testing all functions with strings of length', len(s1)
    print [func(s1, s2) for func in funcs], '\n'

def time_test(loops=10000, reps=3):
    ''' Print timing stats for all the functions '''
    slen = len(s1)
    print 'Length = %d, Loops = %d, Repetitions = %d' % (slen, loops, reps)

    for func in funcs:
        #Get function name and docstring
        fname = func.func_name
        fdoc = func.__doc__

        print '\n%s:%s' % (fname, fdoc)
        t = Timer('%s(s1, s2)' % fname, 'from __main__ import s1, s2, %s' % fname)
        results = t.repeat(reps, loops)
        results.sort()
        print results
    print '\n' + '- '*30 + '\n'

def make_strings(n, r=0.5):
    print 'r:', r
    s1 = 'a' * n
    s2 = ''.join(['b' if random() < r else 'a' for _ in xrange(n)])
    return s1, s2

# ------------------------------------

seed(37)

s1, s2 = make_strings(100)
#print '%s\n%s\n' % (s1, s2)
check()
time_test(10000)

s1, s2 = make_strings(100, 0.1)
check()
time_test(10000)

s1, s2 = make_strings(100, 0.9)
check()
time_test(10000)

s1, s2 = make_strings(10)
check()
time_test(50000)

s1, s2 = make_strings(1000)
check()
time_test(1000)

以下结果来自Linux上运行Python 2.6.6的32位2GHz Pentium 4。

<强>输出

r: 0.5
Testing all functions with strings of length 100
[45, 45, 45, 45, 45, 45, 45] 

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop 
[0.62271595001220703, 0.63597297668457031, 0.65991997718811035]

h_dist1: List comprehension 
[0.80136799812316895, 1.0849411487579346, 1.1687240600585938]

h_dist2: Generator expression 
[0.81829214096069336, 0.82315492630004883, 0.85774612426757812]

h_dist3: Generator expression with if 
[0.67409086227416992, 0.67418098449707031, 0.68189001083374023]

h_dist3a: Generator expression with izip 
[0.54596519470214844, 0.54696321487426758, 0.54910516738891602]

h_dist4: imap 
[0.4574120044708252, 0.45927596092224121, 0.46362900733947754]

h_dist5: starmap 
[0.38610100746154785, 0.38653087615966797, 0.39858913421630859]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

r: 0.1
Testing all functions with strings of length 100
[13, 13, 13, 13, 13, 13, 13] 

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop 
[0.59487199783325195, 0.61918497085571289, 0.62035894393920898]

h_dist1: List comprehension 
[0.77733206748962402, 0.77883815765380859, 0.78676295280456543]

h_dist2: Generator expression 
[0.8313758373260498, 0.83669614791870117, 0.8419950008392334]

h_dist3: Generator expression with if 
[0.60900688171386719, 0.61443901062011719, 0.6202390193939209]

h_dist3a: Generator expression with izip 
[0.48425912857055664, 0.48703289031982422, 0.49215483665466309]

h_dist4: imap 
[0.45452284812927246, 0.46001195907592773, 0.4652099609375]

h_dist5: starmap 
[0.37329483032226562, 0.37666082382202148, 0.40111804008483887]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

r: 0.9
Testing all functions with strings of length 100
[94, 94, 94, 94, 94, 94, 94] 

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop 
[0.69256496429443359, 0.69339799880981445, 0.70190787315368652]

h_dist1: List comprehension 
[0.80547499656677246, 0.81107187271118164, 0.81337189674377441]

h_dist2: Generator expression 
[0.82524299621582031, 0.82638883590698242, 0.82899308204650879]

h_dist3: Generator expression with if 
[0.80344915390014648, 0.8050081729888916, 0.80581092834472656]

h_dist3a: Generator expression with izip 
[0.63276004791259766, 0.63585305213928223, 0.64699077606201172]

h_dist4: imap 
[0.46122288703918457, 0.46677708625793457, 0.46921491622924805]

h_dist5: starmap 
[0.38288688659667969, 0.38731098175048828, 0.38867902755737305]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

r: 0.5
Testing all functions with strings of length 10
[5, 5, 5, 5, 5, 5, 5] 

Length = 10, Loops = 50000, Repetitions = 3

h_dist0: For loop 
[0.55377697944641113, 0.55385804176330566, 0.56589198112487793]

h_dist1: List comprehension 
[0.69614696502685547, 0.71386599540710449, 0.71778011322021484]

h_dist2: Generator expression 
[0.74240994453430176, 0.77340388298034668, 0.77429509162902832]

h_dist3: Generator expression with if 
[0.66713404655456543, 0.66874384880065918, 0.67353487014770508]

h_dist3a: Generator expression with izip 
[0.59427285194396973, 0.59525203704833984, 0.60147690773010254]

h_dist4: imap 
[0.46971893310546875, 0.4749150276184082, 0.4831998348236084]

h_dist5: starmap 
[0.46615099906921387, 0.47054886817932129, 0.47225403785705566]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

r: 0.5
Testing all functions with strings of length 1000
[506, 506, 506, 506, 506, 506, 506] 

Length = 1000, Loops = 1000, Repetitions = 3

h_dist0: For loop 
[0.59869503974914551, 0.60042905807495117, 0.60753512382507324]

h_dist1: List comprehension 
[0.68359518051147461, 0.70072579383850098, 0.7146599292755127]

h_dist2: Generator expression 
[0.7492527961730957, 0.75325894355773926, 0.75805497169494629]

h_dist3: Generator expression with if 
[0.59286904335021973, 0.59505105018615723, 0.59793591499328613]

h_dist3a: Generator expression with izip 
[0.49536395072937012, 0.49821090698242188, 0.54327893257141113]

h_dist4: imap 
[0.42384982109069824, 0.43060398101806641, 0.43535709381103516]

h_dist5: starmap 
[0.34122705459594727, 0.35040402412414551, 0.35851287841796875]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -