无法将Numba cuda设备函数传递给python Ray远程函数进行分布式计算

时间:2019-06-20 16:23:50

标签: numba ray

我正在使用python Ray和Numba cuda进行分布式GPU计算。

我想将Numba cuda设备功能传递给Ray远程功能,但是失败。

相同版本的多处理效果很好。

代码如下:

这部分正在工作。它定义了一个cuda内核,随后使用多处理来完成计算。 (但是,同一版本在Ray上失败)

import multiprocessing
import numpy as np
import os
from numba import cuda

# cuda kernel function
def cuda_kernel(fun):
    result = cuda.device_array(10000,dtype=np.float64)

    @cuda.jit
    def kernel(result):
        thread_id = cuda.grid(1)
        if thread_id < 10000:
            result[thread_id] = fun(thread_id)

    # Configure the blocks
    threadsperblock = 16
    blockspergrid = (10000 + (threadsperblock - 1)) // threadsperblock

    # Start the kernel 
    kernel[blockspergrid, threadsperblock](result)

    result = result.copy_to_host()

    return result

# multiprocessing to start the cuda kernel function
def multi_processing():
    os.environ["CUDA_VISIBLE_DEVICES"] = '0'
    result = cuda_kernel(fun)
    np.save(os.getcwd()+'/result', result)

# start multi-processing to allocate     
p = multiprocessing.Process(target = multi_processing)
p.daemon = True
p.start()

# define a device function that should be used by cuda kernel
@cuda.jit(device=True)
def fun(x):
    return x**2

np.load(os.getcwd()+'/result'+'.npy')

然后我将代码更改为Ray版本,如下所示:

它具有与以前完全相同的功能,但使用Ray。

import multiprocessing
import numpy as np
import os
from numba import cuda
import ray

# initialize ray
ray.shutdown()
ray.init()

# cuda kernel function
@ray.remote(num_gpus=1)
def cuda_kernel(fun):
    result = cuda.device_array(10000,dtype=np.float64)

    @cuda.jit
    def kernel(result):
        thread_id = cuda.grid(1)
        if thread_id < 10000:
            result[thread_id] = fun(thread_id)

    # Configure the blocks
    threadsperblock = 16
    blockspergrid = (10000 + (threadsperblock - 1)) // threadsperblock

    # Start the kernel 
    kernel[blockspergrid, threadsperblock](result)

    result = result.copy_to_host()

    return result

# use ray instead of multiprocessing
def ray_process():
    os.environ["CUDA_VISIBLE_DEVICES"] = '0'
    result = ray.get(cuda_kernel.remote(fun))
    np.save(os.getcwd()+'/result', result)

ray_process()

# define a device function that should be used by cuda kernel
@cuda.jit(device=True)
def fun(x):
    return x**2

np.load(os.getcwd()+'/result'+'.npy')

这会导致错误:

numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'fun': cannot determine Numba type of <class 'numba.cuda.compiler.DeviceFunctionTemplate'>

File "<ipython-input-6-4c4236d93522>", line 20:
<source missing, REPL/exec in use?>

这有点奇怪,我想这个问题是由于Ray和多处理功能不同所致。是否有人对此有任何解决方案(我必须将该函数传递给cuda内核)?

0 个答案:

没有答案