我正在使用python Ray和Numba cuda进行分布式GPU计算。
我想将Numba cuda设备功能传递给Ray远程功能,但是失败。
相同版本的多处理效果很好。
代码如下:
这部分正在工作。它定义了一个cuda内核,随后使用多处理来完成计算。 (但是,同一版本在Ray上失败)
import multiprocessing
import numpy as np
import os
from numba import cuda
# cuda kernel function
def cuda_kernel(fun):
result = cuda.device_array(10000,dtype=np.float64)
@cuda.jit
def kernel(result):
thread_id = cuda.grid(1)
if thread_id < 10000:
result[thread_id] = fun(thread_id)
# Configure the blocks
threadsperblock = 16
blockspergrid = (10000 + (threadsperblock - 1)) // threadsperblock
# Start the kernel
kernel[blockspergrid, threadsperblock](result)
result = result.copy_to_host()
return result
# multiprocessing to start the cuda kernel function
def multi_processing():
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
result = cuda_kernel(fun)
np.save(os.getcwd()+'/result', result)
# start multi-processing to allocate
p = multiprocessing.Process(target = multi_processing)
p.daemon = True
p.start()
# define a device function that should be used by cuda kernel
@cuda.jit(device=True)
def fun(x):
return x**2
np.load(os.getcwd()+'/result'+'.npy')
然后我将代码更改为Ray版本,如下所示:
它具有与以前完全相同的功能,但使用Ray。
import multiprocessing
import numpy as np
import os
from numba import cuda
import ray
# initialize ray
ray.shutdown()
ray.init()
# cuda kernel function
@ray.remote(num_gpus=1)
def cuda_kernel(fun):
result = cuda.device_array(10000,dtype=np.float64)
@cuda.jit
def kernel(result):
thread_id = cuda.grid(1)
if thread_id < 10000:
result[thread_id] = fun(thread_id)
# Configure the blocks
threadsperblock = 16
blockspergrid = (10000 + (threadsperblock - 1)) // threadsperblock
# Start the kernel
kernel[blockspergrid, threadsperblock](result)
result = result.copy_to_host()
return result
# use ray instead of multiprocessing
def ray_process():
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
result = ray.get(cuda_kernel.remote(fun))
np.save(os.getcwd()+'/result', result)
ray_process()
# define a device function that should be used by cuda kernel
@cuda.jit(device=True)
def fun(x):
return x**2
np.load(os.getcwd()+'/result'+'.npy')
这会导致错误:
numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'fun': cannot determine Numba type of <class 'numba.cuda.compiler.DeviceFunctionTemplate'>
File "<ipython-input-6-4c4236d93522>", line 20:
<source missing, REPL/exec in use?>
这有点奇怪,我想这个问题是由于Ray和多处理功能不同所致。是否有人对此有任何解决方案(我必须将该函数传递给cuda内核)?