第一个单元格我有这个:
from numba import cuda
@cuda.jit
def thread_counter_safe(global_counter):
cuda.atomic.add(global_counter, 0, 1) # Safely add 1 to offset 0 in global_counter array
在下一个单元格上,我有这个:
global_counter = cuda.to_device(np.array([0], dtype=np.int32))
thread_counter_safe[64, 64](global_counter)
print('Should be %d:' % (64*64), global_counter.copy_to_host())
global_counter = cuda.to_device(np.array([0], dtype=np.int32))
%timeit thread_counter_safe[64, 64](global_counter)
print('Should be %d:' % (64*64), global_counter.copy_to_host())
第二个单元格的输出如下:
Should be 4096: [4096]
10000 loops, best of 3: 118 µs per loop
Should be 4096: [168390656]
Jupyter Notebook的timeit在其迭代测试中带有global_counter
。如何正确地回馈global_counter
?