我想获得分析信息。我的命令队列已经启用了分析。
这是我的代码:
status = clEnqueueNDRangeKernel(
commandQueue,
kernl,
2,
NULL,
globalThreads,
localThreads,
0,
NULL,
&ndrEvt);
CHECK_OPENCL_ERROR(status, "clEnqueueNDRangeKernel failed.");
//Won't proceed ahead if all work-items have not finished processing; Synchronization point
status = clFinish(commandQueue);
CHECK_OPENCL_ERROR(status, "clFlush failed.");
//fetch performance data
clGetEventProfilingInfo(ndrEvt, CL_PROFILING_COMMAND_QUEUED, sizeof(cl_ulong), &time_start2, NULL);
//clRetainEvent(ndrEvt);
clGetEventProfilingInfo(ndrEvt, CL_PROFILING_COMMAND_SUBMIT, sizeof(cl_ulong), &time_end2, NULL);
single_exec_time2 = time_end2 - time_start2;
clGetEventProfilingInfo(ndrEvt, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &time_end, NULL);
single_exec_time = time_end - time_start2;
Signle_exec_time_2显示了正确的结果,但是single_exec_time = 0。
我认为问题在于事件处理,ndrEvt的引用计数为零。
我试图介绍clRetainEvent(ndrEvt)(你可以看作是注释)并且“工作”,所以在这一点上我想知道引入clRetainEvent()是否会给我正确的结果?