我尝试使用NumbaPro CUDA Python在矩阵上执行一个非常基本的邻居算法。
功能:
@autojit(target="gpu")
def removeNeighboursMatCUDA(tmp_frame):
for j in range(255):
for i in range(255):
if tmp_frame[i][j]!=0:
if tmp_frame[i+1][j]!=0:
tmp_frame[i][j]=0
tmp_frame[i][j+1]=0
if tmp_frame[i][j+1]!=0:
tmp_frame[i][j]=0
tmp_frame[i+1][j]=0
if tmp_frame[i+1][j+1]!=0:
tmp_frame[i][j]=0
tmp_frame[i+1][j+1]=0
if i>0 and tmp_frame[i-1][j-1]!=0:
tmp_frame[i][j]=0
tmp_frame[i-1][j-1]=0
return tmp_frame
函数输入是2D数组(256x256):
tmp_frame = coo_matrix((c_tmp,(x_tmp,y_tmp)),shape=(256,256)).todense()
M = removeNeighboursMatCUDA(tmp_frame)
当目标是CPU时,此代码执行时没有任何问题,但对于GPU,我收到以下错误:
TypingError: No conversion from array(int16, 2d, C) to none for '$333.2'
我找不到任何关于此错误的信息。任何人都知道什么是错的或问题是什么?
编辑:错误是由return
语句引起的。删除return
会修复代码。
答案 0 :(得分:2)
我自己发现了。正如EDIT中已经说明的那样,问题是return
语句。固定代码附在下面:
@jit(target="gpu")
def removeNeighboursMatCUDA(tmp_frame,res_frame):
for j in range(255):
for i in range(255):
if tmp_frame[i][j]!=0:
if tmp_frame[i+1][j]!=0:
res_frame[i][j]=0
res_frame[i][j+1]=0
if tmp_frame[i][j+1]!=0:
res_frame[i][j]=0
res_frame[i+1][j]=0
if tmp_frame[i+1][j+1]!=0:
res_frame[i][j]=0
res_frame[i+1][j+1]=0
if i>0 and tmp_frame[i-1][j-1]!=0:
res_frame[i][j]=0
res_frame[i-1][j-1]=0
tmp_frame=res_frame