Question

提前谢谢大家。

我想知道#include所有numpy标题的正确方法是什么，以及使用Cython和C ++解析numpy数组的正确方法。以下是尝试：

// cpp_parser.h 
#ifndef _FUNC_H_
#define _FUNC_H_

#include <Python.h>
#include <numpy/arrayobject.h>

void parse_ndarray(PyObject *);

#endif

我知道这可能是错的，我也尝试了其他选项，但没有一个可行。

// cpp_parser.cpp
#include "cpp_parser.h"
#include <iostream>

using namespace std;

void parse_ndarray(PyObject *obj) {
    if (PyArray_Check(obj)) { // this throws seg fault
        cout << "PyArray_Check Passed" << endl;
    } else {
        cout << "PyArray_Check Failed" << endl;
    }
}

PyArray_Check例程抛出Segmentation Fault。 PyArray_CheckExact不会抛出，但这不是我想要的。

# parser.pxd
cdef extern from "cpp_parser.h": 
    cdef void parse_ndarray(object)

，实施文件是：

# parser.pyx
import numpy as np
cimport numpy as np

def py_parse_array(object x):
    assert isinstance(x, np.ndarray)
    parse_ndarray(x)

setup.py脚本是

# setup.py
from distutils.core import setup, Extension
from Cython.Build import cythonize

import numpy as np

ext = Extension(
    name='parser',
    sources=['parser.pyx', 'cpp_parser.cpp'],
    language='c++',
    include_dirs=[np.get_include()],
    extra_compile_args=['-fPIC'],
)

setup(
    name='parser',
    ext_modules=cythonize([ext])
    )

最后是测试脚本：

# run_test.py
import numpy as np
from parser import py_parse_array

x = np.arange(10)
py_parse_array(x)

我创建了一个包含上述所有脚本的git repo：https://github.com/giantwhale/study_cython_numpy/

Answer 1

快速修复（请继续阅读以获取更多详细信息和更复杂的方法）：

您需要通过调用PyArray_API在每个使用numpy-stuff的cpp文件中初始化变量import_array()：

//it is only a trick to ensure import_array() is called, when *.so is loaded
//just called only once
int init_numpy(){
     import_array(); // PyError if not successful
     return 0;
}

const static int numpy_initialized =  init_numpy();

void parse_ndarraray(PyObject *obj) { // would be called every time
    if (PyArray_Check(obj)) {
        cout << "PyArray_Check Passed" << endl;
    } else {
        cout << "PyArray_Check Failed" << endl;
    }
}

还可以使用_import_array，如果不成功则返回负数，以使用自定义错误处理。 See here定义import_array。

警告：正如@ isra60指出的那样，只有在初始化Python之后，即在_import_array()/import_array()被调用之后，才能调用Py_Initialize()。对于扩展，情况总是如此，但如果嵌入了python解释器，情况并非总是如此，因为numpy_initialized在main之前初始化 - 开始。在这种情况下，＆＃34;初始化技巧＆＃34;不应该使用，init_numpy()之后调用Py_Initialize()。

先进的解决方案：

建议的解决方案很快，但如果使用numpy有多个cpp，则会有很多PyArray_API初始化的实例。

如果PyArray_API未定义为静态，则可以避免这种情况，但除了一个翻译单元外，extern除外numpy/arrayobject.h。对于那些翻译单元NO_IMPORT_ARRAY，必须在包含NO_IMPORT_ARRAY之前定义宏。

然而，我们需要一个定义了这个符号的翻译单元。对于此翻译单元，不得定义宏PY_ARRAY_UNIQUE_SYMBOL。

但是，如果不定义宏PyArray_API，我们将只获得一个静态符号，即对其他转换单元不可见，因此链接器将失败。原因是：如果有两个库，并且每个人都定义了一个PY_ARRAY_UNIQUE_SYMBOL，那么我们将有一个符号的多重定义，并且链接器将失败，即我们不能将这两个库一起使用。

因此，通过在MY_FANCY_LIB_PyArray_API的每个包含之前将numpy/arrayobject.h定义为PyArray_API，我们将拥有自己的numpy/arrayobject.h - 名称，这不会与其他库发生冲突。< / p>

全部放在一起：

A： use_numpy.h - 包含numpy功能的标题，即//use_numpy.h //your fancy name for the dedicated PyArray_API-symbol #define PY_ARRAY_UNIQUE_SYMBOL MY_PyArray_API //this macro must be defined for the translation unit #ifndef INIT_NUMPY_ARRAY_CPP #define NO_IMPORT_ARRAY //for usual translation units #endif //now, everything is setup, just include the numpy-arrays: #include <numpy/arrayobject.h>

init_numpy_api.cpp

B： MY_PyArray_API - 初始化全球//init_numpy_api.cpp //first make clear, here we initialize the MY_PyArray_API #define INIT_NUMPY_ARRAY_CPP //now include the arrayobject.h, which defines //void **MyPyArray_API #inlcude "use_numpy.h" //now the old trick with initialization: int init_numpy(){ import_array();// PyError if not successful return 0; } const static int numpy_initialized = init_numpy();的翻译单位：

use_numpy.h

C：只要您需要numpy就包含extern void **MyPyArray_API，它会定义//example #include "use_numpy.h" ... PyArray_Check(obj); // works, no segmentation error：

Py_Initialize()

警告：不应忘记，要使初始化技巧起作用，必须已调用extra_compile_args=['-fPIC', '-O0', '-g'], extra_link_args=['-O0', '-g'],。

为什么需要它（由于历史原因而保留）：

当我使用调试符号构建扩展时：

 gdb --args python run_test.py
 (gdb) run
  --- Segmentation fault
 (gdb) disass

并使用gdb：

运行它

   0x00007ffff1d2a6d9 <+20>:    mov    0x203260(%rip),%rax       
       # 0x7ffff1f2d940 <_ZL11PyArray_API>
   0x00007ffff1d2a6e0 <+27>:    add    $0x10,%rax
=> 0x00007ffff1d2a6e4 <+31>:    mov    (%rax),%rax
   ...
   (gdb) print $rax
   $1 = 16

我可以看到以下内容：

PyArray_Check

我们应该记住，#define PyArray_Check(op) PyObject_TypeCheck(op, &PyArray_Type)只是define for：

&PyArray_Type

看来，PyArray_API以某种方式使用0的一部分未初始化（具有值cpp_parser.cpp）。

让我们看看预处理器之后的-E（使用标记static void **PyArray_API= __null ... static int _import_array(void) { PyArray_API = (void **)PyCapsule_GetPointer(c_api,...编译：

PyArray_AP

所以_import_array(void)我是静态的，并通过_import_array()初始化，这实际上可以解释我在构建过程中得到的警告，PyArray_API已定义但未使用 - 我们没有＆＃39 ; t初始化PyArray_API。

因为import_array()是一个静态变量，所以必须在每个编译单元中初始化它，即cpp - file。

所以我们只需要这样做 - low似乎是正式的方式。

Answer 2

由于您使用Cython，numpy API已经包含在Cython Includes中。它在jupyter笔记本中直截了当。

cimport numpy as np
from numpy cimport PyArray_Check

np.import_array()  # Attention!

def parse_ndarray(object ndarr):
    if PyArray_Check(ndarr):
        print("PyArray_Check Passed")
    else:
        print("PyArray_Check Failed")

我认为np.import_array()是关键，因为你调用了numpy API。评论并尝试，也会出现崩溃。

import numpy as np
from array import array
ndarr = np.arange(3)
pyarr = array('i', range(3))
parse_ndarray(ndarr)
parse_ndarray(pyarr)
parse_ndarray("Trick or treat!")

输出：

PyArray_Check Passed
PyArray_Check Failed
PyArray_Check Failed

PyArray_Check使用Cython / C ++提供Segmentation Fault

2 个答案: