Question

我最近尝试使用函数指针在我的应用程序中动态定义几个处理阶段，在sm_30上运行。

在这里发布代码会很困难，因为有许多不同的文件和功能，但基本上，我从Cuda Toolkit 5.0中包含的示例开始。

我分配了一个设备函数缓冲区，我复制了一个设备函数指针，由于与DeviceToDevice copy Kind一起使用的cudaMemcpyfromsymbolAsync而在样本中定义。

我的设备指针在.cu.h中定义如下：

//device function pointer model
typedef void (*func)(structGpuArgument*);

//Declaring a function
__device__ void gpuFunc1(structGpuArgument* arg1);

在其他地方我有一个.cu，其中包含上一个包含以下代码的声明：

//get the actual function pointer
__device__ func gpuFuncPtr = gpuFunc1;

//Buffer to store a list of function pointer
func* pFuncDevBuffer;
cudaMalloc(&pFuncDevBuffer,NB_FUNC*sizeof(func));

//copy the actual function pointer (symbol) to the list buffer 
cudaMemcpyFromSymbolAsync( pFuncDevBuffer+i ,gpuFuncPtr,sizeof(func),0,cudaMemcpyDeviceToDevice,stream)

//Launch the kernel that will use the functions
kernel_test<<<1,10,0,stream>>>(pFuncDevBuffer)
...

//defining the kernel that uses pointer buffer
__global__ void kernel_test(func* pFuncDevBuffer)
{
   printf("func address : %p\n",pFuncDevBuffer[0]);
   pFuncDevBuffer[0](NULL);
}

//defining the function pointed by the function pointer
__device__ void gpuFunc1(structGpuArgument* arg1)
{
     do_something;
}

事实上，只要在参数中使用设备函数缓冲区的全局内核在函数及其指针所在的同一文件中定义，一切正常。然后内核可以打印出函数的地址（0x4）并执行其代码而没有问题我不使用单独的编译。

当在程序的同一个实例中，在别处定义的第二个内核在参数中使用相同的函数指针缓冲区时，它可以打印出函数指针（0x4）的相同内存地址，但是如果它试图执行它，它无法在cuda-memcheck中的0x00000000处发出非法指令。任何其他cuda API调用冻结之后，我需要重新启动计算机（我的gpu不支持通过cuda-smi重置）。

我想知道在使用函数指针时是否存在已知问题，即通过使用在其他文件中定义的函数指针缓冲区，但共享相同的函数指针定义。

此外，如果在没有重新启动整个系统的情况下在segfault之后重新设置设备，可以帮助我节省调试应用程序的时间。

感谢您的帮助

Cuda函数指针一致性

0 个答案: