我已经使用numba.SmartArrays为添加向量编写了此代码。我是第一次使用这个numba.SmartArrays。我不知道如何使用它。 此代码不起作用,它会抛出错误。
import numpy as np
from numba import SmartArray,cuda, jit, uint32
li1=np.uint32([1,2,3,4])
li=np.uint32([1,2,3,4])
b=SmartArray(li,where="host",copy=True)
a=SmartArray(li1,where="host",copy=True)
c=np.uint32([1,1,1,1])
print type(li)
print type(a)
@cuda.jit('void(uint32[:],uint32[:],uint32[:])',type="gpu")
def additionG(c,a,b):
idx=cuda.threadIdx.x+cuda.blockDim.x*cuda.blockIdx.x
if idx< len(a):
a[idx]=c[idx]+b[idx]
dA=cuda.to_device(a)
dB=cuda.to_device(b)
dC=cuda.to_device(c)
additionG[1, 128](c,a,b)
print a.__array__()
错误:
<type 'numpy.ndarray'>
<class 'numba.smartarray.SmartArray'>
Traceback (most recent call last):
File "C:\Users\hp-pc\My Documents\LiClipse Workspace\cuda\blowfishgpu_smart_arrays.py", line 20, in <module>
dA=cuda.to_device(a)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\devices.py", line 257, in _require_cuda_context
return fn(*args, **kws)
File "C:\Anaconda\lib\site-packages\numba\cuda\api.py", line 55, in to_device
to, new = devicearray.auto_device(obj, stream=stream, copy=copy)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 403, in auto_device
devobj.copy_to_device(obj, stream=stream)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 148, in copy_to_device
sz = min(_driver.host_memory_size(ary), self.alloc_size)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 1348, in host_memory_size
s, e = host_memory_extents(obj)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 1333, in host_memory_extents
return mviewbuf.memoryview_get_extents(obj)
TypeError: expected a readable buffer object
答案 0 :(得分:2)
自从我发布这个问题以来已经有一段时间了。仍然发布答案,以便将来可能会发现它有用。
import numpy as np
from numba import SmartArray,cuda, jit, uint32,autojit
li1=np.uint32([6,7,8,9])
li=np.uint32([1,2,3,4])
a=SmartArray(li1,where='host',copy=True)
b=SmartArray(li,where="host",copy=True)
c=np.uint32([1,1,1,1])
def additionG(a,c):
idx=cuda.threadIdx.x+cuda.blockDim.x*cuda.blockIdx.x
if idx < len(c):
a[idx]=a[idx]+c[idx]
cuda.syncthreads()
bpg=1
tpb=128
dC=cuda.to_device(c)
cfunc = cuda.jit()(additionG)
cfunc[bpg, tpb](a,dC)
print a.__array__()
答案 1 :(得分:1)
在我看来cuda.to_device
并不处理智能数组,这有点意义,因为智能数组应该取消明确的副本管理。
如果我对文档的阅读是正确的(我之前从未尝试过SmartArray
),那么你应该能够改变这个
dA=cuda.to_device(a)
dB=cuda.to_device(b)
dC=cuda.to_device(c)
additionG[1, 128](c,a,b)
到
dC=cuda.to_device(c)
additionG[1, 128](dC,a.gpu(),b.gpu())
.gpu()
方法应返回内核可以理解和访问的GPU驻留对象。