Question

我正在关注this教程，这是一直很好的教程，除了关于如何创建信号量的最后一个示例对我不起作用。逻辑非常简单，但我无法弄清楚为什么这个内核导致无限循环。

myKernel.cl

#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
void GetSemaphor(__global int * semaphor, __global int * data) {
   int occupied = atom_xchg(semaphor, 1);
   int realityCheck = 0;
   while(occupied == 1 && realityCheck++ < 100000)
        occupied = atom_xchg(semaphor, 1);
}

void ReleaseSemaphor(__global int * semaphor)
{
   int prevVal = atom_xchg(semaphor, 0);
}

__kernel void myKernel(__global int* data, __global int* semaphor)
{
    // semaphor[0] is set to 0 on the host.
    GetSemaphor(&semaphor[0], data);
    data[0]++;
    ReleaseSemaphor(&semaphor[0]);
}

这是：

OpenCL 1.2

FULL_PROFILE

在具有

的Quadro NVS 290上

* cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics

Answer 1

您提供的教程是错误的，永远不会在GPU设备上运行。由于HW arquitecture。

任何阻止工作组内工作项的同步机制都无法正常工作。由于阻塞状态将影响整个工作组，因此产生无限循环。

您只能在工作组大小为1或工作组之间执行此类操作。

你如何防止OpenCL信号量出现死锁？

1 个答案: