Question

我试图获取指向Numpy数组的指针，以便我可以在我的Cython代码中快速操作它。我找到了两种获取缓冲区指针的方法，一种使用array.__array_interface__['data'][0]，另一种使用array.ctypes.data。他们都很痛苦。

我创建了一个小的Cython类，它只是创建一个numpy数组并将指针存储到它的缓冲区：

cdef class ArrayHolder:
    cdef array
    cdef long *ptr

    def __init__(ArrayHolder self, allocate=True):
        self.array = np.zeros((4, 12,), dtype=np.int)
        cdef long ptr = self.array.__array_interface__['data'][0]
        self.ptr = <long *>ptr

然后，回到Python，我创建了这个类的多个实例，如下所示：

for i in range(1000000):
    holder = ArrayHolder()

大约需要3.6秒。使用array.ctypes.data是半秒较慢。

当我注释掉对__array_instance__['data']的调用并再次运行代码时，它会在大约1秒后完成。

为什么获取Numpy数组缓冲区的地址这么慢？

Answer 1

使用Cython的静态类型机制可以帮助很多。这样Cython就知道你正在处理的是一个合适类型的数组数组，并且可以生成优化的C代码。

cimport numpy as np # just so it knows np.int_t

cdef class ArrayHolder:
    cdef np.int_t[:,:] array # now specified as a specific array type
    cdef np.int_t *ptr # note I've changed this to match the array type

    def __init__(ArrayHolder self, allocate=True):
        self.array = np.zeros((4, 12,), dtype=np.int)
        self.ptr = &self.array[0,0] # location of the first element

在这个版本中，分配self.array时需要花费很少的成本来检查对象实际上是否是一个数组。但是，元素查找和获取地址现在与使用C指针一样快。

在旧版本中，它是一个任意的python对象，因此有__array_instance__的字典查找，__getitem__的字典查找，允许data的字典查找。 __getitem__的进一步字典查找，以便您找到索引0。

有一点需要注意：如果您使用cdef告诉Cython数组类型，您可以直接在数组上执行所有索引，它的速度与使用指针的速度相同，所以你可以完全跳过创建指针（除非你需要它传递给外部C代码）。最后一点速度Turn off boundscheck and wraparound。

Answer 2

我猜测，它是某种懒惰的装载。 Numpy只有在您第一次访问它时才会在表上执行memset()。我会尝试创建这个数组而不用零填充它来获得时间。

这是我的测试：

import numpy as np

cdef class ArrayHolder:
    cdef array
    cdef long *ptr

    def __init__(ArrayHolder self, allocate=True):
        self.array = np.zeros((4, 12,), dtype=np.int)

    def ptr(ArrayHolder self):
        cdef long ptr = self.array.__array_interface__['data'][0]


from timeit import timeit
from cyth import ArrayHolder


print(timeit("ArrayHolder()", number=1000000, setup="from cyth import ArrayHolder")) 
print(timeit("ArrayHolder().ptr()", number=1000000, setup="from cyth import ArrayHolder"))



$ python test.py                     
1.0442328620702028
3.4246508290525526

非常慢的Numpy缓冲区指针访问

2 个答案: