我正在努力用cython.parallel
初始化线程局部的ndarrays:
的伪代码:
cdef:
ndarray buffer
with nogil, parallel():
buffer = np.empty(...)
for i in prange(n):
with gil:
print "Thread %d: data address: 0x%x" % (threadid(), <uintptr_t>buffer.data)
some_func(buffer.data) # use thread-local buffer
cdef void some_func(char * buffer_ptr) nogil:
(... works on buffer contents...)
我的问题是所有线程buffer.data
都指向同一个地址。即最后分配buffer
的线程的地址。
尽管在buffer
(或parallel()
)块中分配了prange
,但cython不会使buffer
成为private
或线程局部变量,将其保留为shared
变量。
结果,buffer.data
指向对我的算法造成严重破坏的同一内存区域。
这不仅仅是ndarray对象的问题,而是所有cdef class
定义的对象。
如何解决这个问题?
答案 0 :(得分:4)
我想我终于找到了我喜欢的这个问题的解决方案。 简短版本是您创建一个具有形状的数组:
(number_of_threads, ...<whatever shape you need in the thread>...)
然后,调用openmp.omp_get_thread_num并使用它来索引数组以获得&#34;线程本地&#34;子阵列。这避免了每个循环索引都有一个单独的数组(这可能是巨大的),但也会阻止线程相互覆盖。
这是我所做的粗略版本:
import numpy as np
import multiprocessing
from cython.parallel cimport parallel
from cython.parallel import prange
cimport openmp
cdef extern from "stdlib.h":
void free(void* ptr)
void* malloc(size_t size)
void* realloc(void* ptr, size_t size)
...
cdef int num_items = ...
num_threads = multiprocessing.cpu_count()
result_array = np.zeros((num_threads, num_items), dtype=DTYPE) # Make sure each thread uses separate memory
cdef c_numpy.ndarray result_cn
cdef CDTYPE ** result_pointer_arr
result_pointer_arr = <CDTYPE **> malloc(num_threads * sizeof(CDTYPE *))
for i in range(num_threads):
result_cn = result_array[i]
result_pointer_arr[i] = <CDTYPE*> result_cn.data
cdef int thread_number
for i in prange(num_items, nogil=True, chunksize=1, num_threads=num_threads, schedule='static'):
thread_number = openmp.omp_get_thread_num()
some_function(result_pointer_arr[thread_number])