Question

我正在努力用cython.parallel初始化线程局部的ndarrays：

的伪代码：

cdef:
    ndarray buffer

with nogil, parallel():
    buffer = np.empty(...)

    for i in prange(n):
        with gil:
            print "Thread %d: data address: 0x%x" % (threadid(), <uintptr_t>buffer.data)

        some_func(buffer.data)  # use thread-local buffer

cdef void some_func(char * buffer_ptr) nogil:
    (... works on buffer contents...)

我的问题是所有线程buffer.data都指向同一个地址。即最后分配buffer的线程的地址。

尽管在buffer（或parallel()）块中分配了prange，但cython不会使buffer成为private或线程局部变量，将其保留为shared变量。

结果，buffer.data指向对我的算法造成严重破坏的同一内存区域。

这不仅仅是ndarray对象的问题，而是所有cdef class定义的对象。

如何解决这个问题？

Answer 1

我想我终于找到了我喜欢的这个问题的解决方案。简短版本是您创建一个具有形状的数组：

(number_of_threads, ...<whatever shape you need in the thread>...) 然后，调用openmp.omp_get_thread_num并使用它来索引数组以获得＆＃34;线程本地＆＃34;子阵列。这避免了每个循环索引都有一个单独的数组（这可能是巨大的），但也会阻止线程相互覆盖。

这是我所做的粗略版本：

import numpy as np
import multiprocessing

from cython.parallel cimport parallel
from cython.parallel import prange
cimport openmp

cdef extern from "stdlib.h":
    void free(void* ptr)
    void* malloc(size_t size)
    void* realloc(void* ptr, size_t size)

...

cdef int num_items = ...
num_threads = multiprocessing.cpu_count()
result_array = np.zeros((num_threads, num_items), dtype=DTYPE) # Make sure each thread uses separate memory
cdef c_numpy.ndarray result_cn
cdef CDTYPE ** result_pointer_arr
result_pointer_arr = <CDTYPE **> malloc(num_threads * sizeof(CDTYPE *))
for i in range(num_threads):
    result_cn = result_array[i]
    result_pointer_arr[i] = <CDTYPE*> result_cn.data

cdef int thread_number
for i in prange(num_items, nogil=True, chunksize=1, num_threads=num_threads, schedule='static'):
    thread_number = openmp.omp_get_thread_num()
    some_function(result_pointer_arr[thread_number])

cython.parallel：如何初始化线程局部的ndarray缓冲区？

1 个答案: