我对CUDA C ++编程有疑问。我正在使用共享内存。但我需要更大的共享内存。所以我试图重用共享内存。我的代码就像:
__global__ void dist_calculation(...){
..........
{
//1st pass
__shared__ short unsigned int shared_nodes[(number_of_nodes-1)*blocksize];
............
}
{
//2nd pass
__shared__ float s_distance_matrix[(number_of_nodes*(number_of_nodes-1))/2];
........
}
}
共享内存无法同时容纳shared_nodes和s_distance_matrix。但它可以单独容纳每个(我已经测试过)。在第二遍中,程序无法识别shared_nodes(因为它来自第一遍),但显示共享内存没有足够空间的错误。所以看起来,仍然为shared_nodes变量分配了一些空间。有没有办法破坏这种分配(比如cudaFree)?或任何其他建议?
答案 0 :(得分:4)
分配一个足够大的单一无类型缓冲区以容纳任何一个数组,并为算法的每次传递重新解释数组:
__global__ void dist_calculation(...)
{
const unsigned int num_bytes1 = sizeof(unsigned short) * (number_of_nodes-1) * block_size;
const unsigned int num_bytes2 = sizeof(float) * (number_of_nodes) * (number_of_nodes-1)) / 2;
const unsigned int num_shared_bytes = num_bytes1 > num_bytes2? num_bytes1: num_bytes2;
__shared__ char smem[num_shared_bytes];
unsigned short *shared_nodes = reinterpret_cast<unsigned int*>(smem);
first_pass(shared_nodes);
float *distance_matrix = reinterpret_cast<unsigned int*>(smem);
second_pass(distance_matrix);
}