Question

我在CUDA中尝试使用函数指针时发现了一些奇怪的运行时行为。

目标
我的目标是让我的函数指针根据后者的内部属性选择应用于两个对象的函数。简而言之，我想用CUDA内核模拟C ++模板 - 而不是实际使用模板参数或switch子句，而是使用函数指针和class / struct成员。

方法

使用一个属性（struct customObj）定义我的自定义对象int type，该属性将模拟模板的参数。
定义一组虚拟函数（Sum()，Subtract()等）供您选择。
保留要应用的函数列表（functionsList）和相应的type成员以在first_types内存中查找（second_types，__constant__），例如该函数functionsList[i](obj1,obj2)适用于obj1.type == first_types[i]和obj2.type == second_types[i]。

工作代码
以下代码已经在具有CUDA 5.0的Linux x86_64上编译，在具有计算能力3.0（GeForce GTX 670）的GPU上运行。

#include <stdio.h>
#include <iostream>
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}

struct customObj
{
  int type;
  double d;
  // Constructors
  __device__ __host__ customObj() {}
  __device__ __host__ customObj(const int& _type, const double& _d) : type(_type), d(_d) {}
};

typedef void (*function_t)(customObj&, customObj&);
// Define a bunch of functions
__host__ __device__ void Sum(customObj& obj1, customObj& obj2) {printf("Sum chosen! d1 + d2 = %f\n", obj1.d + obj2.d);}
__host__ __device__ void Subtract(customObj& obj1, customObj& obj2) {printf("Subtract chosen! d1 - d2 = %f\n", obj1.d - obj2.d);}
__host__ __device__ void Multiply(customObj& obj1, customObj& obj2) {printf("Multiply chosen! d1 * d2 = %f\n", obj1.d * obj2.d);}

#define ARRAYLENGTH 3
__constant__ int first_type[ARRAYLENGTH] = {1, 2, 3};
__constant__ int second_type[ARRAYLENGTH] = {1, 1, 2};
__constant__ function_t functionsList[ARRAYLENGTH] = {Sum, Sum, Subtract};

// Kernel to loop through functions list
__global__ void choosefunction(customObj obj1, customObj obj2) {
   int i = 0;
   function_t f = NULL;
   do {
     if ((obj1.type == first_type[i]) && (obj2.type == second_type[i])) {
       f = functionsList[i];
       break;
    }
    i++;
  } while (i < ARRAYLENGTH);
  if (f == NULL) printf("No possible interaction!\n");
  else f(obj1,obj2);
}

int main() {
  customObj obj1(1, 5.2), obj2(1, 2.6);
  choosefunction<<<1,1>>>(obj1, obj2);
  gpuErrchk(cudaPeekAtLastError());
  gpuErrchk(cudaDeviceSynchronize()); 

  return 0;
}

问题
我发现的问题是，只要我替换成员int type的数据类型以及相关的变量和函数（__constant__ int first_types[...]等等）......代码就会编译但停止工作！

如果我将数据类型从int更改为char或int8_t，则内存检查程序会在error 4的调用中抛出cudaDeviceSynchronize()。
如果我将数据类型更改为unsigned short int，则会出现硬件堆栈溢出。

那么，在使用__constant__内存时，是否有人遇到类似问题？我真的不知道发生了什么。据我所知，char和int8_t是1字节长度的内置类型，而int的大小是4字节，所以可能是数据对齐，但我和＃39;我只是在这里猜测。此外，由于计算能力2.0，CUDA应该支持GPU上的函数指针。我__constant__内存中的函数指针是否有任何特殊限制？

Answer 1

我能够在64位RHEL 5.5上的CUDA 5.0上重现问题（错误4，未指定的启动失败），但在CUDA 6.0上没有。

请更新/升级至CUDA 6.

CUDA constant内存中的函数指针

1 个答案:

CUDA __constant__内存中的函数指针

1 个答案:

CUDA constant内存中的函数指针