Question

我试图了解GPU的内存组织是如何工作的。

根据下表列出的技术规格，我的GPU可以有8个活动块/ SM和768个线程/ SM。基于此，我认为为了利用上述每个块应该有96（= 768/8）个线程。具有此线程数的最近的块我认为它是一个9x9块，81个线程。使用8个块可以在一个SM中同时运行的事实，我们将有648个线程。剩下的120（= 768-648）怎么样？

我知道这些想法发生了错误。一个简单的例子描述了最大SM线程数与每个块的最大线程数之间的关系以及基于我的GPU规范的warp大小，这将是非常有帮助的。

Device 0: "GeForce 9600 GT"
      CUDA Driver Version / Runtime Version          5.5 / 5.0
      CUDA Capability Major/Minor version number:    1.1
      Total amount of global memory:                 512 MBytes (536870912 bytes)
      ( 8) Multiprocessors x (  8) CUDA Cores/MP:    64 CUDA Cores
      GPU Clock rate:                                1680 MHz (1.68 GHz)
      Memory Clock rate:                             700 Mhz
      Memory Bus Width:                              256-bit
      Max Texture Dimension Size (x,y,z)             1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
      Max Layered Texture Size (dim) x layers        1D=(8192) x 512, 2D=(8192,8192) x 512
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       16384 bytes
      Total number of registers available per block: 8192
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  768
      Maximum number of threads per block:           512
      Maximum sizes of each dimension of a block:    512 x 512 x 64
      Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             256 bytes
      Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
      Run time limit on kernels:                     Yes
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Disabled
      Concurrent kernel execution:                   No
      Device supports Unified Addressing (UVA):      No
      Device PCI Bus ID / PCI location ID:           1 / 0

Answer 1

您可以在cuda编程指南中找到设备的技术规格，如下所示，而不是cuda示例程序的输出。

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities

从硬件角度来看，我们通常会尝试最大化每个多处理器（SM）的warp占用率以获得最大性能。最大占用率受限于3种类型的硬件资源：#warp / SM，＃register / SM和#shared memory / SM。

您可以在cuda安装目录中尝试以下工具，以了解如何进行计算。它将使您更清楚地了解#spires / SM，＃threads / block，＃warp / SM等之间的联系。

$CUDA_HOME/tools/CUDA_Occupancy_Calculator.xls

发现我的GPU功能

1 个答案: