我不明白以下几行究竟发生了什么:
unsigned char *membershipChanged = (unsigned char *)sharedMemory;
和
float *clusters = (float *)(sharedMemory + blockDim.x);
我认为在#1 sharedMemory
中有效地重命名为membershipChanged
,但为什么要将blockDim
添加到sharedMemory
指针。这个地址指向哪里?
sharedMemory
是使用extern __shared__ char sharedMemory[];
我在CUDA kmeans implementation中找到的代码。
void find_nearest_cluster(int numCoords,
int numObjs,
int numClusters,
float *objects, // [numCoords][numObjs]
float *deviceClusters, // [numCoords][numClusters]
int *membership, // [numObjs]
int *intermediates)
{
extern __shared__ char sharedMemory[];
// The type chosen for membershipChanged must be large enough to support
// reductions! There are blockDim.x elements, one for each thread in the
// block.
unsigned char *membershipChanged = (unsigned char *)sharedMemory;
float *clusters = (float *)(sharedMemory + blockDim.x);
membershipChanged[threadIdx.x] = 0;
// BEWARE: We can overrun our shared memory here if there are too many
// clusters or too many coordinates!
for (int i = threadIdx.x; i < numClusters; i += blockDim.x) {
for (int j = 0; j < numCoords; j++) {
clusters[numClusters * j + i] = deviceClusters[numClusters * j + i];
}
}
.....
答案 0 :(得分:4)
sharedMemory + blockDim.x
点距离共享内存区域的基础blockDim.x
个字节。
您可能会执行此类操作的原因是在共享内存中进行子分配。包含find_nearest_cluster
的内核的启动站点为内核动态分配一定量的共享存储。该代码暗示两个逻辑上不同的数组驻留在sharedMemory
- membershipChanged
和clusters
指向的共享存储中。指针算术只是获取指向第二个数组的指针的一种方法。