Question

我需要在GPU内部生成随机数。我为此目的使用了来自numba的cuda.jit装饰器。根据我的小经验，如果我们在cuda.jit中指定cuda内核的参数类型，代码运行得更快。根据numba文档中的this示例，我使用create_xoroshiro128p_states()创建了RNG状态。但是，cuda.jit中没有传递任何参数类型。什么可以是参数类型？

下面给出了一个最小的工作示例。

    from __future__ import division
    from __future__ import print_function

    import time
    import numpy as np

    from numba import cuda, jit
    from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32
    np.set_printoptions(precision=4, suppress=True)

    # Number of random numbers to be generated
    T = 2048

    # ======= 1-D GRIDS =======
    # Set the number of threads in a block
    threadsperblock_1d = 512 
    blockspergrid_1d = np.int(np.ceil(T / threadsperblock_1d))
    # ======= 1-D GRIDS =======

    @cuda.jit #('void(float32[:], float32[:])')
    def gen_rand_nos(rng_states, output_rand):
      thread_id = cuda.grid(1)
      output_rand[thread_id] = xoroshiro128p_uniform_float32(rng_states, thread_id)

    if __name__ == '__main__':
      start_time = time.time()
      RandVals_numpy = np.random.normal(size=T).astype(np.float32)
      print("---numpy operation %s seconds ---" % (time.time() - start_time))

      rng_states = create_xoroshiro128p_states(threadsperblock_1d * blockspergrid_1d, seed=np.random.randint(100)) # Setting a random seed every time
      rand_gpu = cuda.device_array(T, dtype=np.float32)
      start_time = time.time()
      gen_rand_nos[blockspergrid_1d, threadsperblock_1d](rng_states, rand_gpu)
      print("---Just GPU operation %s seconds ---" % (time.time() - start_time))
      rand_cpu = rand_gpu.copy_to_host()

      # Print first 20 random values generated
      print(rand_cpu[:20])
      print(rand_cpu.shape)

如果我使用@cuda.jit('void(float32[:], float32[:])') gen_rand_nos()功能，我会收到错误

带参数的getitem使用无效（float32，const（＆＃39; s0＆＃39;））

我使用的是python 2.7和numba 0.35.0

我可以为numba xoroshiro128p_states指定参数类型

0 个答案: