英特尔高清显卡4000& amp; openCL Nvidia GeForce GT 650M无法正常工作:CL_INVALID_DEVICE错误

时间:2014-10-08 19:59:58

标签: xcode macos opencl gpu cpu

所以我在某些openCL设备上运行代码时遇到了一些问题。我将在2013年中期开发15"视网膜屏幕Macbook pro OSX 10.9.5(Mavericks)并使用Xcode 6.0.1


Device: Intel(R) Core(TM) i7-3635QM CPU @ 2.40GHz
Hardware version: OpenCL 1.2 
Software version: 1.1
OpenCL C version: OpenCL C 1.2 
Parallel compute units: 8

Device: HD Graphics 4000
Hardware version: OpenCL 1.2 
Software version: 1.2(Aug 17 2014 20:29:07)
OpenCL C version: OpenCL C 1.2 
Parallel compute units: 16

Device: GeForce GT 650M
Hardware version: OpenCL 1.2 
Software version: 8.26.28 310.40.55b01
OpenCL C version: OpenCL C 1.2 
Parallel compute units: 2

所以根据这个,我应该有1个CPU和2个GPU:HD Graphics 4000和GeForce GT 650M。

我的问题是,当我尝试调用clGetkernelWorkGroupInfo时,如果我传入两个GPU之一的deviceID,它会返回CL_INVALID_DEVICE错误,但如果我传入CPU ID并且将毫无问题地计算我的内核代码,则工作正常。





cl_int err; //Error catcher
cl_platform_id platform; //Computer platform
cl_context context; //Single context for whole platform
cl_uint deviceCount; //Number of devices (CPU + GPU) available on machine
cl_device_id *devices; //Array of pointers to devices;
cl_program program; //OpenCL program
cl_command_queue *commandQueues; //One command queue for each device

int DATA_SIZE = 16384;
double results[DATA_SIZE];    //  results returned from device;
int currDevice = 0;           //Use this to just access first available device

/*---Get First Platform---*/
err = clGetPlatformIDs(1, &platform, NULL);
CheckError(err, "A valid platform could not be found on this machine");

/*---Get Device Count---*/
err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 0, NULL, &deviceCount);
CheckError(err, "Could not determine the number of devices available on this platform");

/*---Get All Devices---*/
devices = new cl_device_id[deviceCount];
err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, deviceCount, devices, NULL);
CheckError(err, "Could not access the devices");

/*---Create a single context for all devices---*/
context = clCreateContext(NULL, deviceCount, devices, NULL, NULL, &err);
CheckError(err, "Could not create a context on this platform");

/*---For each device create a separate command queue---*/
commandQueues = new cl_command_queue[deviceCount];
for(int i = 0; i < deviceCount; i++)
    commandQueues[i] = clCreateCommandQueue(context, devices[i], 0, &err);
    string errMsg = "Was unable to successfully set up a command queue for device number " + to_string(i);
    CheckError(err, errMsg);

/*---Read in cl file---*/
char *KernelSource = ReadFile("./Source/Sampling/Sampler.cl");

//  Create the compute program from the source buffer
program = clCreateProgramWithSource(context, 1, (const char **) & KernelSource, NULL, &err);
CheckError(err, "Failed to create compute program!");

//   Build the program executable
err = clBuildProgram(program, deviceCount, devices, NULL, NULL, NULL);
if (err != CL_SUCCESS)
    size_t len;
    char buffer[2048];

    printf("Error: Failed to build program executable!\n");
    clGetProgramBuildInfo(program, devices[currDevice], CL_PROGRAM_BUILD_LOG, sizeof(buffer), buffer, &len);
    printf("%s\n", buffer);

//  Create the compute kernel in the program we wish to run
cl_kernel kernel = clCreateKernel(program, "mySampler", &err);
CheckError(err, "Failed to create compute kernel!");

// Create the input array in device memory for our calculation
cl_mem input = clCreateBuffer(context,  CL_MEM_READ_ONLY,  sizeof(double) * DATA_SIZE, NULL, &err);
CheckError(err, "Failed to allocate device memory");

//   Set the arguments to our compute kernel
err  = clSetKernelArg(kernel, 0, sizeof(cl_mem), &input);
CheckError(err, "Failed to set kernel arguments");

size_t global, local;

//   Get the maximum work group size for executing the kernel on the device
err = clGetKernelWorkGroupInfo(kernel, devices[currDevice], CL_KERNEL_WORK_GROUP_SIZE, sizeof(local), &local, NULL);
CheckError(err, "Failed to retrieve work group info!");

//   Execute the kernel over the entire range of our 1d input data set
//   using the maximum number of work group items for this device
global = DATA_SIZE;
err = clEnqueueNDRangeKernel(commandQueues[currDevice], kernel, 1, NULL, &global, &local, 0, NULL, NULL);
CheckError(err, "Failed to execute kernel!");

//  Wait for the command commands to get serviced before reading back results

//  Read back the results from the device to verify the output
err = clEnqueueReadBuffer(commandQueues[currDevice], input, CL_TRUE, 0, sizeof(double) * DATA_SIZE, results, 0, NULL, NULL );
CheckError(err, "Failed to read array");

for(int i = 0; i < DATA_SIZE; i++)
    std::cout<<"RESULT: "<<i<<" "<<results[i]<<std::endl;

//  Shutdown and cleanup


1 个答案:

答案 0 :(得分:4)

我认为该程序无法为您的一个或两个GPU构建。我刚刚在自己的OS X系统上检查了这一点,clBuildProgram()如果能够为您通过的任何设备构建程序,则会CL_SUCCESS返回clBuildProgram()它,即使其他设备的构建失败。

如果在for (int i = 0; i < deviceCount; i++) { cl_build_status status; clGetProgramBuildInfo(program, devices[i], CL_PROGRAM_BUILD_STATUS, sizeof(status), &status, NULL); std::cout << "Build status for device " << i << " = " << status << std::endl; } 调用之后添加此代码,您可以检查构建是否真的成功了所有内容:


我注意到您正在使用double值 - HD 4000不支持双精度,使用double类型的内核将无法构建。在编译使用Build status for device 0 = 0 Build status for device 1 = -2 Build status for device 2 = 0 和您的主机代码(以及上面的代码片段)的内核时,我得到以下输出:


如您所见,两个设备的构建成功,但设备1(HD 4000)的构建不成功。
