我是PyCUDA的新手。我想用__device__
声明的函数调用用__global__
声明的函数。我怎么能在pyCUDA中这样做?
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import numpy as n
import pycuda.autoinit
import pycuda.gpuarray as gp
d=gp.zeros(shape=(128,128),dtype=n.int32)
h=n.zeros(shape=(128,128),dtype=n.int32)
mod=SourceModule("""
__global__ void matAdd(int *a)
{
int px=blockIdx.x*blockDim.x+threadIdx.x;
int py=blockIdx.y*blockDim.y+threadIdx.y;
a[px*128+py]+=1;
matMul(px);
}
__device__ void matMul( int px)
{
px=5;
}
""")
m=mod.get_function("matAdd")
m(d,block=(32,32,1),grid=(4,4))
d.get(h)
以上代码给出了以下错误
7-linux-i686.egg/pycuda/../include/pycuda kernel.cu]
[stderr:
kernel.cu(8): error: identifier "matMul" is undefined
kernel.cu(12): warning: parameter "px" was set but never used
1 error detected in the compilation of "/tmp/tmpxft_00002286_00000000-6_kernel.cpp1.ii".
]
答案 0 :(得分:1)
在引用matMul
函数之前,您应该声明它。你可以这样做:
__device__ void matMul( int px); // declaration
__global__ void matAdd(int *a)
{
int px=blockIdx.x*blockDim.x+threadIdx.x;
int py=blockIdx.y*blockDim.y+threadIdx.y;
a[px*128+py]+=1;
matMul(px);
}
__device__ void matMul( int px) // implementation
{
px=5; // by the way, this assignment does not propagate outside this function
}
,或者只是将整个matMul
函数移到matAdd
之前。