使用for循环比使用reduce更快地求和?

时间:2011-03-25 18:20:22

标签: python performance

我想看看使用for循环进行简单数值运算的速度有多快。这是我发现的(使用标准的timeit库):

In [54]: print(setup)
from operator import add, iadd
r = range(100)

In [55]: print(stmt1)    
c = 0
for i in r:
    c+=i        

In [56]: timeit(stmt1, setup)
Out[56]: 8.948904991149902
In [58]: print(stmt3)    
reduce(add, r)    

In [59]: timeit(stmt3, setup)
Out[59]: 13.316915035247803

再看一点:

In [68]: timeit("1+2", setup)
Out[68]: 0.04145693778991699

In [69]: timeit("add(1,2)", setup)
Out[69]: 0.22807812690734863

这里发生了什么?显然,reduce会比for循环更快,但函数调用似乎占主导地位。减少版本不应该几乎完全在C中运行吗?在for循环版本中使用iadd(c,i)使其在~24秒内运行。为什么使用operator.add比+慢得多?我的印象是+和operator.add运行相同的C代码(我检查以确保operator.add不只是在python中调用+或任何东西)。

BTW,只需在~2.3秒内使用总和。

In [70]: print(sys.version)
2.7.1 (r271:86882M, Nov 30 2010, 09:39:13) 
[GCC 4.0.1 (Apple Inc. build 5494)]

3 个答案:

答案 0 :(得分:6)

reduce(add, r)必须调用add()函数100次,因此函数调用的开销会增加 - 减少使用PyEval_CallObject在每次迭代时调用add

for (;;) {
    ...
    if (result == NULL)
        result = op2;
    else {
        # here it is creating a tuple to pass the previous result and the next
        # value from range(100) into func add():
        PyTuple_SetItem(args, 0, result);
        PyTuple_SetItem(args, 1, op2);
        if ((result = PyEval_CallObject(func, args)) == NULL)
            goto Fail;
    }

已更新:对评论中的问题的回复。

当您在Python源代码中键入1 + 2时,字节码编译器会执行添加并将该表达式替换为3

f1 = lambda: 1 + 2
c1 = byteplay.Code.from_code(f1.func_code)
print c1.code

1           1 LOAD_CONST           3
            2 RETURN_VALUE         

如果添加两个变量a + b,编译器将生成字节码,该字节码加载两个变量并执行BINARY_ADD,这比调用函数执行添加要快得多:

f2 = lambda a, b: a + b
c2 = byteplay.Code.from_code(f2.func_code)
print c2.code

1           1 LOAD_FAST            a
            2 LOAD_FAST            b
            3 BINARY_ADD           
            4 RETURN_VALUE         

答案 1 :(得分:0)

这可能是复制args和返回值的开销(即“add(1,2)”),而不是简单地操作数字文字

答案 2 :(得分:0)

编辑:切换零而不是数组乘以缩短差距。

from functools import reduce
from numpy import array, arange, zeros
from time import time

def add(x, y):
    return x + y

def sum_columns(x):
    if x.any():
        width = len(x[0])
        total = zeros(width)
    for row in x:
        total += array(row)
    return total

l = arange(3000000)
l = array([l, l, l])

start = time()
print(reduce(add, l))
print('Reduce took {}'.format(time() - start))

start = time()
print(sum_columns(l))
print('For loop took took {}'.format(time() - start))

让你几乎没有差别。

Reduce took 0.03230619430541992 For loop took took 0.058577775955200195

old :如果使用reduce来按索引将NumPy数组加在一起,则它可能比for循环更快。

from functools import reduce
from numpy import array, arange
from time import time

def add(x, y):
    return x + y

def sum_columns(x):
    if x.any():
        width = len(x[0])
        total = array([0] * width)
    for row in x:
        total += array(row)
    return total

l = arange(3000000)
l = array([l, l, l])

start = time()
print(reduce(add, l))
print('Reduce took {}'.format(time() - start))

start = time()
print(sum_columns(l))
print('For loop took took {}'.format(time() - start))

结果

[      0       3       6 ..., 8999991 8999994 8999997]
Reduce took 0.024930953979492188
[      0       3       6 ..., 8999991 8999994 8999997]
For loop took took 0.3731539249420166