目前,我在Cupy工作,到目前为止,它已经帮助我启动了自定义内核,以解决我一直在研究的问题。但是,在使动态并行性在Cupy中工作时,我遇到了一些障碍。 对于文档或其他陷入困境的人,或者在Cupy中都正式支持动态并行性,我似乎找不到太多的东西。有人对我该如何处理有任何建议吗?
下面是我想要做的一些示例代码。
extern "C"
{
__global__ void test_kernel_inner()
{
printf(" Hello inner world!\n");
}
__global__ void test_kernel()
{
printf("Hello outer world!\n");
test_kernel_inner<<<2, 1>>>();
}
}
在Cupy中启动该内核
TestKernel = cp.RawKernel(Test, 'test_kernel', ('-rdc=true',))
TestKernel((1,),(1,),())
我得到了错误
File "/home/palmerss/PycharmProjects/TestProject/Test.py", line 23, in <module>
TestKernel((1,),(1,),())
File "cupy/core/raw.pyx", line 51, in cupy.core.raw.RawKernel.__call__
File "cupy/util.pyx", line 55, in cupy.util.memoize.decorator.ret
File "cupy/core/raw.pyx", line 57, in cupy.core.raw._get_raw_kernel
File "cupy/core/carray.pxi", line 125, in cupy.core.core.compile_with_cache
File "cupy/core/carray.pxi", line 166, in cupy.core.core.compile_with_cache
File "/home/palmerss/anaconda3/envs/TestProject/lib/python3.7/site-packages/cupy/cuda/compiler.py", line 168, in compile_with_cache
cubin = ls.complete()
File "cupy/cuda/function.pyx", line 216, in cupy.cuda.function.LinkState.complete
File "cupy/cuda/function.pyx", line 217, in cupy.cuda.function.LinkState.complete
File "cupy/cuda/driver.pyx", line 161, in cupy.cuda.driver.linkComplete
File "cupy/cuda/driver.pyx", line 82, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_UNKNOWN: unknown error
使用注释中链接的Pycuda示例代码会出现以下错误。
mod = DynamicSourceModule(cdpSimplePrint_cu)
File "/home/palmerss/anaconda3/envs/TestProject/lib/python3.7/site-packages/pycuda/compiler.py", line 470, in __init__
self.link()
File "/home/palmerss/anaconda3/envs/TestProject/lib/python3.7/site-packages/pycuda/compiler.py", line 439, in link
self.module = self.linker.link_module()
pycuda._driver.Error: cuLinkComplete failed: unknown error - error : Undefined reference to 'cudaGetParameterBufferV2' in 'kernel.ptx'
error : Undefined reference to 'cudaLaunchDeviceV2' in 'kernel.ptx'
在注释中还链接了该线程之后,我发现必须为DynamicSourceModule提供cuda_libdir参数,如下所示。
mod = DynamicSourceModule(cusource.read(), cuda_libdir='/usr/local/cuda/lib64')