我需要在GPU内部生成随机数。我为此目的使用了来自numba的cuda.jit
装饰器。根据我的小经验,如果我们在cuda.jit
中指定cuda内核的参数类型,代码运行得更快。根据numba文档中的this示例,我使用create_xoroshiro128p_states()
创建了RNG状态。但是,cuda.jit
中没有传递任何参数类型。什么可以是参数类型?
下面给出了一个最小的工作示例。
from __future__ import division
from __future__ import print_function
import time
import numpy as np
from numba import cuda, jit
from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32
np.set_printoptions(precision=4, suppress=True)
# Number of random numbers to be generated
T = 2048
# ======= 1-D GRIDS =======
# Set the number of threads in a block
threadsperblock_1d = 512
blockspergrid_1d = np.int(np.ceil(T / threadsperblock_1d))
# ======= 1-D GRIDS =======
@cuda.jit #('void(float32[:], float32[:])')
def gen_rand_nos(rng_states, output_rand):
thread_id = cuda.grid(1)
output_rand[thread_id] = xoroshiro128p_uniform_float32(rng_states, thread_id)
if __name__ == '__main__':
start_time = time.time()
RandVals_numpy = np.random.normal(size=T).astype(np.float32)
print("---numpy operation %s seconds ---" % (time.time() - start_time))
rng_states = create_xoroshiro128p_states(threadsperblock_1d * blockspergrid_1d, seed=np.random.randint(100)) # Setting a random seed every time
rand_gpu = cuda.device_array(T, dtype=np.float32)
start_time = time.time()
gen_rand_nos[blockspergrid_1d, threadsperblock_1d](rng_states, rand_gpu)
print("---Just GPU operation %s seconds ---" % (time.time() - start_time))
rand_cpu = rand_gpu.copy_to_host()
# Print first 20 random values generated
print(rand_cpu[:20])
print(rand_cpu.shape)
如果我使用@cuda.jit('void(float32[:], float32[:])')
gen_rand_nos()
功能,我会收到错误
带参数的getitem使用无效(float32,const(' s0'))
我使用的是python 2.7和numba 0.35.0