在我的内核中,我将两个大的int [,] lemmaA和lemmaB相互比较。它们由gpu.Allocate()分配给GPU。 我的内核看起来像:
private static void Kernel(int[,] lemmaA, int[,] lemmaB, int[] result, int L, int x)
{
var start = blockIdx.x * blockDim.x + threadIdx.x;
var stride = gridDim.x * blockDim.x;
for (var i = start; i < L; i += stride)
{
result[i] = Calculate(lemmaA, lemmaB, x, i);
}
}
public static int Calculate(int[,] lemma1, int[,] lemma2, int x, int i)
{
int result = 0;
for(int z = 0; z < 40; z++)
{
int c1 = lemma1[x, z];
int c2 = lemma2[i, z];
r += DoSomething(c1,c2);
}
return result;
}
在Calculate方法中我只在每个int [,]数组中使用int []行/数组,我想知道如果我将每行/ int []分配给本地,是否可以更快地执行数组,并使用本地数组进行计算。
但是如何从内核中的int [,]复制一行/ int []?
private static void Kernel(int[,] lemmaA, int[,] lemmaB, int[] result, int L, int x)
{
var start = blockIdx.x * blockDim.x + threadIdx.x;
var stride = gridDim.x * blockDim.x;
for (var i = start; i < L; i += stride)
{
int[] lemma1 = __local__.Array<int>(40);
COPY(lemma1, lemmaA, a,b,c,d); // <- What to do here ??
result[i] = Calculate(lemma1, lemma2);
}
}
public static int Calculate(int[] lemma1, int[] lemma2)
{}