Question

以下是NVIDIA Compute Visual Profiler的内核占用率分析副本：

Kernel details : Grid size: 300 x 1, Block size: 224 x 1 x 1
Register Ratio      = 0.75  ( 24576 / 32768 ) [48 registers per thread] 
Shared Memory Ratio = 0 ( 0 / 49152 ) [0 bytes per Block] 
Active Blocks per SM    = 2 : 8
Active threads per SM   = 448 : 1536
Occupancy       = 0.291667  ( 14 / 48 )
Achieved occupancy  = 0.291667  (on 14 SMs)
Occupancy limiting factor   = Registers 
Warning: Grid Size (300) is not a multiple of available SMs (14).

我是openCL的新手，我做了很多优化来降低使用的寄存器数量，以便在SM上启动3个并发块。但是，分析器仅显示只能同时运行2个块，并且限制因子是寄存器。但问题是很明显，我的内核每块仅使用224 x 48 = 10752个寄存器，因此能够运行3个块（即224 x48 x 3 = 32256个寄存器/ 32768个可用寄存器）。当我将每个块的线程数减少到208时，问题仍然存在，这意味着它应该只使用208 x 48 x 3 = 29952/32768 3个块...

起初，我认为这是因为本地内存，但我对本地内存的计算显示它应该能够启动3个块/ SM。我不知道为什么分析器不显示共享内存比率，尽管我的内核使用本地内存。

感谢您的帮助。

Profiler显示OpenCL不使用所有可用的寄存器

0 个答案: