Question

我在Pandas source中几次见过这样的事情：

def nancorr(ndarray[float64_t, ndim=2] mat, bint cov=0, minp=None):
    # ...
    N, K = (<object> mat).shape

这意味着称为ndarray的NumPy mat是type-casted到Python对象。^*

在进一步检查中，似乎使用了此方法，因为如果没有，则会出现编译错误。我的问题是：为什么首先需要这种类型的转换？

以下是一些示例。 This答案仅表明元组打包在Cython中不像在Python中那样工作-但这似乎不是元组解包的问题。（无论如何，这都是一个很好的答案，我并不是要选择它。）

采用以下脚本shape.pyx。它将在编译时失败，并显示“无法将'npy_intp *'转换为Python对象。”

from cython cimport Py_ssize_t
import numpy as np
from numpy cimport ndarray, float64_t
cimport numpy as cnp
cnp.import_array()

def test_castobj(ndarray[float64_t, ndim=2] arr):

    cdef:
        Py_ssize_t b1, b2

    # Tuple unpacking - this will fail at compile
    b1, b2 = arr.shape
    return b1, b2

但同样，问题本身并不是元组 unpacking 。这将失败，并显示相同的错误。

def test_castobj(ndarray[float64_t, ndim=2] arr):

    cdef:
        # Py_ssize_t b1, b2
        ndarray[float64_t, ndim=2] zeros

    zeros = np.zeros(arr.shape, dtype=np.float64)
    return zeros

貌似，这里没有进行元组拆包。元组是np.zeros的第一个参数。

def test_castobj(ndarray[float64_t, ndim=2] arr):
    """This works"""
    cdef:
        Py_ssize_t b1, b2
        ndarray[float64_t, ndim=2] zeros

    b1, b2 = (<object> arr).shape
    zeros = np.zeros((<object> arr).shape, dtype=np.float64)
    return b1, b2, zeros

这也可行（也许是最令人困惑的）：

def test_castobj(object[float64_t, ndim=2] arr):
    cdef:
        tuple shape = arr.shape
        ndarray[float64_t, ndim=2] zeros
    zeros = np.zeros(shape, dtype=np.float64)
    return zeros

示例：

>>> from shape import test_castobj
>>> arr = np.arange(6, dtype=np.float64).reshape(2, 3)

>>> test_castobj(arr)
(2, 3, array([[0., 0., 0.],
        [0., 0., 0.]]))

_{*也许与arr是memoryview有关吗？但这是黑暗中的一枪。}

另一个例子是在Cython docs中：

cpdef int sum3d(int[:, :, :] arr) nogil:
    cdef size_t i, j, k
    cdef int total = 0
    I = arr.shape[0]
    J = arr.shape[1]
    K = arr.shape[2]

在这种情况下，只需为arr.shape[i]编制索引就可以防止出现我认为很奇怪的错误。

这也有效：

def test_castobj(object[float64_t, ndim=2] arr):
    cdef ndarray[float64_t, ndim=2] zeros
    zeros = np.zeros(arr.shape, dtype=np.float64)
    return zeros

Answer 1

您是对的，这与Cython下的元组拆包无关。

原因是，cnp.ndarray不是通常的numpy数组（这意味着具有python已知接口的numpy数组），而是numpy对{的C实现的Cython wrapper {3}}（在Python中称为np.array）：

ctypedef class numpy.ndarray [object PyArrayObject]:
    cdef __cythonbufferdefaults__ = {"mode": "strided"}

    cdef:
        # Only taking a few of the most commonly used and stable fields.
        # One should use PyArray_* macros instead to access the C fields.
        char *data
        int ndim "nd"
        npy_intp *shape "dimensions"
        npy_intp *strides
        dtype descr
        PyObject* base

shape实际上映射到基础C语句的PyArrayObject（npy_intp *shape "dimensions"而不是简单的npy_intp *dimensions）。这是一个把戏，所以有人可以写

mat.shape[0]

，它的外观（在某种程度上是感觉）就像调用了numpy的python属性shape一样。但实际上，它是直接使用底层C语言的快捷方式。

调用python-shape的代价很高：必须创建一个元组并用dimensions中的值填充，然后才能访问第0个元素。另一方面，Cython的实现方式便宜得多-只需访问正确的元素即可。

但是，如果您仍想访问数组的python属性，则必须将其强制转换为普通的python对象（即，忘记这是一个ndarray），然后shape是通过通常的Python机制解决了对元组属性的调用。

所以基本上，即使这很方便，您也不想像在pandas代码中那样紧紧地访问numpy数组的维，而是为性能做更冗长的变体：

...
N=mat.shape[0]
K=mat.shape[1]
...

为什么您可以在函数签名中编写object[cnp.float64_t]或类似内容，这让我感到很奇怪-然后，该参数显然被解释为一个简单的对象。也许这只是一个错误。

Cython：为什么需要将NumPy数组类型转换为对象？

1 个答案: