我注意到,当汇总1 000 000个整数列表时,Python的内置sum
函数比for循环快大约3倍:
import timeit
def sum1():
s = 0
for i in range(1000000):
s += i
return s
def sum2():
return sum(range(1000000))
print 'For Loop Sum:', timeit.timeit(sum1, number=10)
print 'Built-in Sum:', timeit.timeit(sum2, number=10)
# Prints:
# For Loop Sum: 0.751425027847
# Built-in Sum: 0.266746997833
为什么? sum
如何实施?
答案 0 :(得分:13)
速度差实际上大于3倍,但是你通过首先创建一个包含100万个整数的巨大内存列表来减慢任一版本的速度。将那些时间试验分开:
>>> import timeit
>>> def sum1(lst):
... s = 0
... for i in lst:
... s += i
... return s
...
>>> def sum2(lst):
... return sum(lst)
...
>>> values = range(1000000)
>>> timeit.timeit('f(lst)', 'from __main__ import sum1 as f, values as lst', number=100)
3.457869052886963
>>> timeit.timeit('f(lst)', 'from __main__ import sum2 as f, values as lst', number=100)
0.6696369647979736
现在速度差异已经超过5倍。
for
循环作为解释的Python字节码执行。 sum()
完全以C代码循环。解释的字节码和C代码之间的速度差异很大。
此外,C代码确保不创建新的Python对象,如果它可以保持C类型的总和;这适用于int
和float
结果。
反汇编的Python版本执行此操作:
>>> import dis
>>> def sum1():
... s = 0
... for i in range(1000000):
... s += i
... return s
...
>>> dis.dis(sum1)
2 0 LOAD_CONST 1 (0)
3 STORE_FAST 0 (s)
3 6 SETUP_LOOP 30 (to 39)
9 LOAD_GLOBAL 0 (range)
12 LOAD_CONST 2 (1000000)
15 CALL_FUNCTION 1
18 GET_ITER
>> 19 FOR_ITER 16 (to 38)
22 STORE_FAST 1 (i)
4 25 LOAD_FAST 0 (s)
28 LOAD_FAST 1 (i)
31 INPLACE_ADD
32 STORE_FAST 0 (s)
35 JUMP_ABSOLUTE 19
>> 38 POP_BLOCK
5 >> 39 LOAD_FAST 0 (s)
42 RETURN_VALUE
除了解释器循环比C慢,INPLACE_ADD
将创建一个新的整数对象(过去255,CPython将小int
个对象缓存为单例。)
您可以在Python mercurial代码存储库中看到C implementation,但它在评论中明确说明:
/* Fast addition by keeping temporary sums in C instead of new Python objects.
Assumes all inputs are the same type. If the assumption fails, default
to the more general routine.
*/
答案 1 :(得分:1)
您可以在Python/bltinmodule.c
中看到源代码。它有int
和float
s的特殊情况,但由于总和非常快地溢出到long
,因此这可能不会对性能产生重大影响。一般情况逻辑非常类似于您在Python中编写的内容,只是在C中。加速很可能是由于它不必经历所有字节码解释和错误处理开销:
static PyObject*
builtin_sum(PyObject *self, PyObject *args)
{
PyObject *seq;
PyObject *result = NULL;
PyObject *temp, *item, *iter;
if (!PyArg_UnpackTuple(args, "sum", 1, 2, &seq, &result))
return NULL;
iter = PyObject_GetIter(seq);
if (iter == NULL)
return NULL;
if (result == NULL) {
result = PyInt_FromLong(0);
if (result == NULL) {
Py_DECREF(iter);
return NULL;
}
} else {
/* reject string values for 'start' parameter */
if (PyObject_TypeCheck(result, &PyBaseString_Type)) {
PyErr_SetString(PyExc_TypeError,
"sum() can't sum strings [use ''.join(seq) instead]");
Py_DECREF(iter);
return NULL;
}
Py_INCREF(result);
}
#ifndef SLOW_SUM
/* Fast addition by keeping temporary sums in C instead of new Python objects.
Assumes all inputs are the same type. If the assumption fails, default
to the more general routine.
*/
if (PyInt_CheckExact(result)) {
long i_result = PyInt_AS_LONG(result);
Py_DECREF(result);
result = NULL;
while(result == NULL) {
item = PyIter_Next(iter);
if (item == NULL) {
Py_DECREF(iter);
if (PyErr_Occurred())
return NULL;
return PyInt_FromLong(i_result);
}
if (PyInt_CheckExact(item)) {
long b = PyInt_AS_LONG(item);
long x = i_result + b;
if ((x^i_result) >= 0 || (x^b) >= 0) {
i_result = x;
Py_DECREF(item);
continue;
}
}
/* Either overflowed or is not an int. Restore real objects and process normally */
result = PyInt_FromLong(i_result);
temp = PyNumber_Add(result, item);
Py_DECREF(result);
Py_DECREF(item);
result = temp;
if (result == NULL) {
Py_DECREF(iter);
return NULL;
}
}
}
if (PyFloat_CheckExact(result)) {
double f_result = PyFloat_AS_DOUBLE(result);
Py_DECREF(result);
result = NULL;
while(result == NULL) {
item = PyIter_Next(iter);
if (item == NULL) {
Py_DECREF(iter);
if (PyErr_Occurred())
return NULL;
return PyFloat_FromDouble(f_result);
}
if (PyFloat_CheckExact(item)) {
PyFPE_START_PROTECT("add", Py_DECREF(item); Py_DECREF(iter); return 0)
f_result += PyFloat_AS_DOUBLE(item);
PyFPE_END_PROTECT(f_result)
Py_DECREF(item);
continue;
}
if (PyInt_CheckExact(item)) {
PyFPE_START_PROTECT("add", Py_DECREF(item); Py_DECREF(iter); return 0)
f_result += (double)PyInt_AS_LONG(item);
PyFPE_END_PROTECT(f_result)
Py_DECREF(item);
continue;
}
result = PyFloat_FromDouble(f_result);
temp = PyNumber_Add(result, item);
Py_DECREF(result);
Py_DECREF(item);
result = temp;
if (result == NULL) {
Py_DECREF(iter);
return NULL;
}
}
}
#endif
for(;;) {
item = PyIter_Next(iter);
if (item == NULL) {
/* error, or end-of-sequence */
if (PyErr_Occurred()) {
Py_DECREF(result);
result = NULL;
}
break;
}
/* It's tempting to use PyNumber_InPlaceAdd instead of
PyNumber_Add here, to avoid quadratic running time
when doing 'sum(list_of_lists, [])'. However, this
would produce a change in behaviour: a snippet like
empty = []
sum([[x] for x in range(10)], empty)
would change the value of empty. */
temp = PyNumber_Add(result, item);
Py_DECREF(result);
Py_DECREF(item);
result = temp;
if (result == NULL)
break;
}
Py_DECREF(iter);
return result;
}
答案 2 :(得分:1)
如dwanderson
所述,Numpy是另一种选择。确实,如果你想做一些数学。见这个基准:
import numpy as np
r = range(1000000) # 12.5 ms
s = sum(r) # 7.9 ms
ar = np.arange(1000000) # 0.5 ms
as = np.sum(ar) # 0.6 ms
因此,使用numpy
创建列表并对其进行求和要快得多。这主要是因为numpy.array
是为此设计的,并且比列表更有效。
但是,如果我们有一个python列表,那么numpy
非常慢,因为它从列表转换为numpy.array
是缓慢的:
r = range(1000000)
ar = np.array(r) # 102 ms
答案 3 :(得分:0)
但是,如果循环仅从0开始每次迭代加1,则可以使用快速技巧加法。范围(1000000)的总输出应为499999500000
import timeit
def sum1():
s = 0
for i in range(1000000):
s += i
#print s
return s
def sum2():
return sum(range(1000000))
def sum3():
s = range(1000000)
s = ((s[1]+s[-1])/2) * (len(s)-1)
#print(s)
return s
print 'For Loop Sum:', timeit.timeit(sum1, number=10)
print 'Built-in Sum:', timeit.timeit(sum2, number=10)
print 'Fast Sum:', timeit.timeit(sum3, number=10)
#prints
#For Loop Sum: 1.8420711
#Built-in Sum: 1.1081646
#Fast Sum: 0.3191561