Question

就像标题所说的那样，我正在对并行计算机视觉技术进行一些个人研究。使用CUDA，我试图实现Hough变换的GPGPU版本。我遇到的唯一问题是在投票过程中。我正在调用atomicAdd（）来防止多个，同时写入操作，我似乎没有获得太多的性能效率。我在网上搜索过，但没有找到任何明显提高投票过程表现的方法。

非常感谢您就投票过程提供的任何帮助。

Answer 1

我不熟悉Hough变换，所以发布一些伪代码可能会有所帮助。但如果您对投票感兴趣，可以考虑使用CUDA投票内在指令来加速投票。

请注意，这需要2.0或更高版本的计算能力（费米或更高版本）。

如果您希望计算特定条件为真的块中的线程数，则可以使用__syncthreads_count()。

bool condition = ...; // compute the condition
int blockCount = __syncthreads_count(condition); // must be in non-divergent code

如果您要计算条件为真的网格中的线程数，则可以执行atomicAdd

bool condition = ...; // compute the condition
int blockCount = __syncthreads_count(condition); // must be in non-divergent code
atomicAdd(totalCount, blockCount);

如果需要计算组中的线程数小于条件为真的块，则可以使用__ballot()和__popc()（填充计数）。

// get the count of threads within each warp for which the condition is true
bool condition = ...; // compute the condition in each thread
int warpCount = __popc(__ballot()); // see the CUDA programming guide for details

希望这有帮助。

Answer 2

在很短的时间内，我确实使用了投票程序......

最后，atomicAdd在两种情况下变得更快

此链接非常有用： warp-filtering

这是我解决的问题Write data only from selected lanes in a Warp using Shuffle + ballot + popc

你不是在找一个关键部分吗？

CUDA中的广义Hough变换 - 如何加快分箱过程？

2 个答案: