Question

我使用包含以下循环的cython编写了一个函数。对阵列A2中的所有值二进制搜索阵列A1的每一行。因此，每个循环迭代返回索引值的2D数组。数组A1和A2作为函数参数输入，正确输入。

数组C按照cython中的要求预先分配在最高缩进级别。

我为这个问题简化了一些事情。

...
cdef np.ndarray[DTYPEint_t, ndim=3] C = np.zeros([N,M,M], dtype=DTYPEint)

for j in range(0,N):
    C[j,:,:]  = np.searchsorted(A1[j,:], A2, side='left' )

到目前为止一切都很好，事情按预期编译和运行。但是，为了获得更高的速度，我想并行化j循环。第一次尝试就是写

for j in prange(0,N, nogil=True):
    C[j,:,:]  = np.searchsorted(A1[j,:], A2, side='left' )

我尝试了很多编码变体，比如把东西放在一个单独的nogil_function中，将结果赋值给一个中间数组，然后编写一个嵌套循环来避免赋值给C的切片部分。

错误通常是'＃34;不允许访问Python属性而没有gil＆＃34;

我无法让它发挥作用。关于我如何做到这一点的任何建议？

编辑：

这是我的setup.py

try:
    from setuptools import setup
    from setuptools import Extension
except ImportError:
    from distutils.core import setup
    from distutils.extension import Extension


from Cython.Build import cythonize

import numpy

extensions = [Extension("matchOnDistanceVectors",
                    sources=["matchOnDistanceVectors.pyx"],
                    extra_compile_args=["/openmp", "/O2"],
                    extra_link_args=[]
                   )]


setup(
ext_modules = cythonize(extensions),
include_dirs=[numpy.get_include()]
)

我在使用msvc编译的Windows 7上。我确实指定了/ openmp标志，我的数组大小为200 * 200。所以一切似乎都在......

Answer 1

我相信searchsorted会实现GIL本身（请参阅https://github.com/numpy/numpy/blob/e2805398f9a63b825f4a2aab22e9f169ff65aae9/numpy/core/src/multiarray/item_selection.c，第1664行“NPY_BEGIN_THREADS_DEF”）。

因此，你可以做到

for j in prange(0,N, nogil=True):
    with gil:
      C[j,:,:]  = np.searchsorted(A1[j,:], A2, side='left' )

暂时要求GIL对Python对象进行必要的工作（希望很快），然后应在searchsorted内再次发布，允许在大部分时间内运行平行。

要更新，我对此进行了快速测试（A1.shape==(105,100)，A2.shape==(302,302)，数字是相当随意选择的。对于10次重复，串行版本耗时4.5秒，并行版本耗时1.4秒（测试在4核CPU上运行）。你没有得到4倍的全速加速，但你得到了接近。

这被编译为described in the documentation。我怀疑如果你没有看到加速，那么它可能是以下任何一个：1）你的数组足够小，函数调用/ numpy检查类型和大小开销占主导地位; 2）您没有在启用OpenMP的情况下编译它;或3）您的编译器不支持OpenMP。

Answer 2

你有一点问题22.你需要GIL来调用numpy.searchsorted，但GIL会阻止任何类型的并行处理。您最好的办法是编写nogil searchsorted版本的cdef mySearchSorted(double[:] array, double target) nogil: # binary search implementation for j in prange(0,N, nogil=True): for k in range(A2.shape[0]): for L in range(A2.shape[1]): C[j, k, L] = mySearchSorted(A1[j, :], A2[k, L])：

numpy.searchsorted

searchsorted也有非常重要的开销，所以如果N很大，那么使用你自己的{{1}}只是为了减少开销是有意义的。

使用cython

2 个答案: