Question

我的笔记本电脑配备了GeForce 960M GTX和Intel HD 530.我正在运行内核并使用以下代码来使用openCL的分析器：

err = clEnqueueNDRangeKernel(queue, voxelization_kernel, 1, NULL, &processed_global_size,
        &local_size, 0, NULL, &kernel_event);
err = clWaitForEvents(1, &kernel_event);
    clGetEventProfilingInfo(kernel_event, CL_PROFILING_COMMAND_START,
        sizeof(time_start), &time_start, NULL);
    clGetEventProfilingInfo(kernel_event, CL_PROFILING_COMMAND_END,
        sizeof(time_end), &time_end, NULL);
    elapsed_time = time_end - time_start;
    printf("Elapsed time in kernel: %f ms\n", (float)(time_end - time_start)/(float)1000000);

英特尔的使用时间比GeForce快10倍。例如，对于在GeForce上占用12.519104ms的内核，Intel HD上的耗用时间仅为1.427828ms。其他数据集存在相同的模式。这看起来很奇怪，因为GeForce应该是一个更好的设备。我在分析中做错了什么，或者我做错了什么？

Answer 1

例如，对于在GeForce上占用12.519104ms的内核，英特尔HD经过的时间仅为1.427828毫秒。

鉴于上述情况，英特尔HD更接近CPU，并且具有较少的计算延迟，如

C [1] = A [1] + B [i]于

例如

或线性复杂度算法，如体素化，其中数据不在设备存储器中重复使用，或者每个线程只读取/写入一次。

距离相关的igpu
igpu的内存路径相关零拷贝
算法复杂度

为什么Opencl内核在英特尔HD上的运行速度比NVIDIA GeForce快得多？

1 个答案: