我需要获得一个简单的示例,以在JIT函数中使用Numba xoroshiro128p创建随机数组。例如按size(2,4)的最终数组外壳。 numba doc here
的链接Pseudo code:
minimum = -2
maximum = 2
out_array = random(minimum, maximum, shape(2,4))
Output:
[[ 1.87569628 2.85881711 3.6009965 1.49224129]
[-3.27321953 1.59090995 -4.66912864 -3.43071647]]
使用cuda比使用numpy可以更快地执行数组创建吗?例如:
minimum_bound = -1
maximum_bound = 1
vectors_number = 12000000
variable_number = 6
@jit
def random_matrix(vectors_number, variable_number):
population_generator = np.random.uniform(minimum_bound,
maximum_bound, (vectors_number, variable_number))
return population_generator
population_array = random_matrix(vectors_number, variable_number)
通过创建1200000个向量,我的速度与在cuda上执行的速度相同。
答案 0 :(得分:2)
可以对example中的documentation进行微调,以完成您想要的
from numba import cuda
from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32
import numpy as np
@cuda.jit
def rand_array(rng_states, out):
thread_id = cuda.grid(1)
x = xoroshiro128p_uniform_float32(rng_states, thread_id)
out[thread_id] = x
threads_per_block = 4
blocks = 2
rng_states = create_xoroshiro128p_states(threads_per_block * blocks, seed=1)
out = np.zeros(threads_per_block * blocks, dtype=np.float32)
rand_array[blocks, threads_per_block](rng_states, out)
print(out.reshape(blocks,threads_per_block))
答案 1 :(得分:-1)
灵感源自软毛答案:
@cuda.jit
def random(threads_per_block, blocks):
def rand_array(rng_states, out): # inside "def random"
thread_id = cuda.grid(1)
x = xoroshiro128p_uniform_float32(rng_states, thread_id)
out[thread_id] = x
rng_states = create_xoroshiro128p_states(threads_per_block * blocks, seed=1)
out = np.zeros(threads_per_block * blocks, dtype=np.float32)
rand_array[blocks, threads_per_block](rng_states, out)
return out.reshape(blocks,threads_per_block)
# Example of usage:
matrix100x100 = random(100, 100)
%timeit random(100, 100)
每循环613 ms±2.6 ms(平均±标准偏差,共运行7次,每个循环1次)
%timeit np.random.rand(100, 100)
每个循环19.1 ms±353 µs(平均±标准偏差,共运行7次,每个循环100个循环)