为什么Python的数组会变慢?

时间:2016-04-21 19:16:49

标签: python arrays performance boxing python-internals

我希望array.array比列表更快,因为数组似乎是未装箱的。

但是,我得到以下结果:

In [1]: import array

In [2]: L = list(range(100000000))

In [3]: A = array.array('l', range(100000000))

In [4]: %timeit sum(L)
1 loop, best of 3: 667 ms per loop

In [5]: %timeit sum(A)
1 loop, best of 3: 1.41 s per loop

In [6]: %timeit sum(L)
1 loop, best of 3: 627 ms per loop

In [7]: %timeit sum(A)
1 loop, best of 3: 1.39 s per loop

造成这种差异的原因是什么?

4 个答案:

答案 0 :(得分:206)

存储是" unboxed",但每次访问元素时,Python必须" box"它(嵌入常规的Python对象中)以便对它做任何事情。例如,您的sum(A)遍历数组,并在常规Python int对象中逐个打包每个整数。这需要时间。在sum(L)中,所有拳击都是在创建列表时完成的。

因此,最后,数组通常较慢,但需要的内存要少得多。

这是来自最近版本的Python 3的相关代码,但是自从Python首次发布以来,相同的基本思想适用于所有CPython实现。

这是访问列表项的代码:

PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    /* error checking omitted */
    return ((PyListObject *)op) -> ob_item[i];
}

它很少:somelist[i]只返回列表中的i对象(CPython中的所有Python对象都是指向初始段的结构的指针符合struct PyObject)的布局。

以下是类型代码为__getitem__的{​​{1}}的{​​{1}}实现:

array

原始内存被视为平台本地l static PyObject * l_getitem(arrayobject *ap, Py_ssize_t i) { return PyLong_FromLong(((long *)ap->ob_item)[i]); } 整数的向量;已读取C' long;然后调用i来包装(" box")Python C long对象中的本地PyLong_FromLong()(在Python 3中,它取消了Python 2&#39}区分C longlong,实际上显示为类型int)。

此拳击必须为Python long对象分配新内存,并将原始int位喷入其中。在原始示例的上下文中,此对象的生命周期非常简短(只需int足够长时间将内容添加到运行总计中),然后需要更多时间来释放新的C long 1}}对象。

这是速度差异的来源,始终来自,并且始终来自CPython实现。

答案 1 :(得分:82)

To add to Tim Peters' excellent answer, arrays implement the buffer protocol, while lists do not. This means that, if you are writing a C extension (or the moral equivalent, such as writing a Cython module), then you can access and work with the elements of an array much faster than anything Python can do. This will give you considerable speed improvements, possibly well over an order of magnitude. However, it has a number of downsides:

  1. You are now in the business of writing C instead of Python. Cython is one way to ameliorate this, but it does not eliminate many fundamental differences between the languages; you need to be familiar with C semantics and understand what it is doing.
  2. PyPy's C API works to some extent, but isn't very fast. If you are targeting PyPy, you should probably just write simple code with regular lists, and then let the JITter optimize it for you.
  3. C extensions are harder to distribute than pure Python code because they need to be compiled. Compilation tends to be architecture and operating-system dependent, so you will need to ensure you are compiling for your target platform.

Going straight to C extensions may be using a sledgehammer to swat a fly, depending on your use case. You should first investigate NumPy and see if it is powerful enough to do whatever math you're trying to do. It will also be much faster than native Python, if used correctly.

答案 2 :(得分:7)

蒂姆·彼得斯回答为什么这很慢,但让我们看看如何提高

坚持你的sum(range(...))示例(比你的例子小10倍,以适应内存):

import numpy
import array
L = list(range(10**7))
A = array.array('l', L)
N = numpy.array(L)

%timeit sum(L)
10 loops, best of 3: 101 ms per loop

%timeit sum(A)
1 loop, best of 3: 237 ms per loop

%timeit sum(N)
1 loop, best of 3: 743 ms per loop

这种方式也需要box / unbox,这会产生额外的开销。为了使它快速,必须保持在numpy c代码中:

%timeit N.sum()
100 loops, best of 3: 6.27 ms per loop

因此,从列表解决方案到numpy版本,这是运行时的因子16。

还要检查创建这些数据结构需要多长时间

%timeit list(range(10**7))
1 loop, best of 3: 283 ms per loop

%timeit array.array('l', range(10**7))
1 loop, best of 3: 884 ms per loop

%timeit numpy.array(range(10**7))
1 loop, best of 3: 1.49 s per loop

%timeit numpy.arange(10**7)
10 loops, best of 3: 21.7 ms per loop

明确的赢家:Numpy

另请注意,创建数据结构所需的时间与求和时间相同,如果不是更多的话。分配内存很慢。

内存使用情况:

sys.getsizeof(L)
90000112
sys.getsizeof(A)
81940352
sys.getsizeof(N)
80000096

因此,每个数字需要8个字节,且开销不同。对于我们使用32位整数的范围就足够了,所以我们可以保护一些内存。

N=numpy.arange(10**7, dtype=numpy.int32)

sys.getsizeof(N)
40000096

%timeit N.sum()
100 loops, best of 3: 8.35 ms per loop

但事实证明,在我的机器上添加64位整数比32位整数更快,所以如果你受到内存/带宽的限制,这是值得的。

答案 3 :(得分:-2)

请注意,100000000等于10^8而不是10^7,我的结果如下所示:

100000000 == 10**8

# my test results on a Linux virtual machine:
#<L = list(range(100000000))> Time: 0:00:03.263585
#<A = array.array('l', range(100000000))> Time: 0:00:16.728709
#<L = list(range(10**8))> Time: 0:00:03.119379
#<A = array.array('l', range(10**8))> Time: 0:00:18.042187
#<A = array.array('l', L)> Time: 0:00:07.524478
#<sum(L)> Time: 0:00:01.640671
#<np.sum(L)> Time: 0:00:20.762153