假设我有一个从MxN 2D矩阵转换的一维数组,我想并行化每一列并进行一些操作。如何为每列分配线程?
例如,如果我有一个3x3矩阵:
1 2 3
4 5 6
7 8 9
我想根据列#添加列中的每个数字(因此第1列将添加1,第2列将添加2 ....),然后它变为:
1+1 2+1 3+1
4+2 5+2 6+2
7+3 8+3 9+3
我如何在CUDA中执行此操作?我知道如何为数组中的所有元素分配线程,但我不知道如何为每个列分配线程。所以,我想要的是发送每一列(1,2,3)(4,5,6)(7,8,9)并进行操作。
答案 0 :(得分:3)
在您的示例中,您将根据行添加数字。你知道矩阵的行/列长度(你知道它是MxN)。你能做的就是:
__global__ void MyAddingKernel(int* matrix, int M, int N)
{
int gid = threadIdx.x + blockDim.x*blockIdx.x;
//Let's add the row number to each element
matrix[ gid ] += gid % M;
//Let's add the column number to each element
matrix[ gid ] += gid % N;
}
如果您想添加其他数字,可以执行以下操作:
matrix[ gid ] += my_col_number_function(gid%N);
答案 1 :(得分:1)
使用更好的网格布局来避免那些模运算。
对最新Cuda上64位范围的行使用唯一块索引。
让线程在所有元素的循环中迭代并添加唯一的线程索引!
如果计算数据在块(行)中是唯一的,则平铺输入数据是一种通用方法,尤其是对于更复杂的计算。
/*
* @param tileCount
*/
__global__ void addRowNumberToCells(int* inOutMat_g,
const unsigned long long int inColumnCount_s,
const int inTileCount_s)
{
//get unique block index
const unsigned long long int blockId = blockIdx.x //1D
+ blockIdx.y * gridDim.x //2D
+ gridDim.x * gridDim.y * blockIdx.z; //3D
/*
* check column ranges in case kernel is called
* with more blocks then columns
* (since its block wide following syncthreads are safe)
*/
if(blockId >= inColumnCount_s)
return;
//get unique thread index
const unsigned long long int threadId = blockId * blockDim.x + threadIdx.x;
/*
* calculate unique and 1 blockId
* maybe shared memory is overhead
* but it shows concept if calculation is more complex
*/
__shared__ unsigned long long int blockIdAnd1_s;
if(threadIdx.x == 0)
blockIdAnd1_s = blockId + 1;
__sycnthreads();
unsigned long long int idx;
//loop over tiles
for(int i = 0; i < inTileCount_s)
{
//calculate new offset for sequence thread writes
idx = i * blockDim.x + threadIdx.x;
//check new index range in case column count is no multiple of blockDim.x
if(idx >= inColumnCount_s)
break;
inOutMat_g[idx] = blockIdAnd1_s;
}
}
示例Cuda 2.0:
垫[131000] [1000]
必要的blockCount = 131000/65535 = 2 for blockDim.y四舍五入!
inTileCount_s = 1000/192 = 6四舍五入!
(每块192个主题= Cuda 2.0占用100个)
&lt;&lt;(65535,2,1),(192,1,1)&gt;&gt; addRowNumberToCells(mat,1000,6)