具有共享内存的指针算法

时间:2011-09-27 18:02:24

标签: cuda

我不明白以下几行究竟发生了什么:

  1. unsigned char *membershipChanged = (unsigned char *)sharedMemory;

  2. float *clusters = (float *)(sharedMemory + blockDim.x);

  3. 我认为在#1 sharedMemory中有效地重命名为membershipChanged,但为什么要将blockDim添加到sharedMemory指针。这个地址指向哪里?

    sharedMemory是使用extern __shared__ char sharedMemory[];

    创建的

    我在CUDA kmeans implementation中找到的代码。

    void find_nearest_cluster(int numCoords,
                              int numObjs,
                              int numClusters,
                              float *objects,           //  [numCoords][numObjs]
                              float *deviceClusters,    //  [numCoords][numClusters]
                              int *membership,          //  [numObjs]
                              int *intermediates)
    {
    extern __shared__ char sharedMemory[];
    
    //  The type chosen for membershipChanged must be large enough to support
    //  reductions! There are blockDim.x elements, one for each thread in the
    //  block.
    unsigned char *membershipChanged = (unsigned char *)sharedMemory;
    float *clusters = (float *)(sharedMemory + blockDim.x);
    
    membershipChanged[threadIdx.x] = 0;
    
    //  BEWARE: We can overrun our shared memory here if there are too many
    //  clusters or too many coordinates!
    for (int i = threadIdx.x; i < numClusters; i += blockDim.x) {
        for (int j = 0; j < numCoords; j++) {
            clusters[numClusters * j + i] = deviceClusters[numClusters * j + i];
        }
    }
    .....
    

1 个答案:

答案 0 :(得分:4)

sharedMemory + blockDim.x点距离共享内存区域的基础blockDim.x个字节。

您可能会执行此类操作的原因是在共享内存中进行子分配。包含find_nearest_cluster的内核的启动站点为内核动态分配一定量的共享存储。该代码暗示两个逻辑上不同的数组驻留在sharedMemory - membershipChangedclusters指向的共享存储中。指针算术只是获取指向第二个数组的指针的一种方法。